PHP repository for Embeddings

Can anyone recommend a good PHP repository for implementing embeddings via the OpenAI API?

I am looking for something similar to this use case:

Hi!

Do you think this could do the job?

(you may need to add up the values in the array to get your cosim)

Depending on how big your embedding corpus is, my rule of thumb typically is that if it’s only around 1000 docs, you might get away with doing it like this.

3 Likes

Thank you Diet!

I have found an implementation here: php-ai/src/PhpAi/Service/Embeddings.php at main · rafasashi/php-ai · GitHub

Do you use trader_mult and array_reduce for scaling?

Isn’t a model like text-embedding-ada-002 supposed to vectorize strings?

No, I don’t use php at all :confused:

Yes, the embedding models vectorize text, but you need to manage the search yourself. That involves storing your document vectors (corpus), and then getting the embedding of the user query (or a rewrite thereof). Then, you compare the search vector to all of your corpus vectors to find the best match(es)

To compare vectors, people commonly use cosine similarity (the higher the better). Cosine similarity is just the dot product (reduce(vector mult)) for openai vectors (some other vectors may need normalization first).

But yeah, you can indeed use this function that you linked, looks more or less correct

    private function cosineSimilarity(array $embedding_a, array $embedding_b)
    {
        $dotProduct = 0;
        $uLength = 0;
        $vLength = 0;
        for ($i = 0; $i < count($embedding_a); $i++) {
            $dotProduct += $embedding_a[$i] * $embedding_b[$i];
            $uLength += $embedding_a[$i] * $embedding_a[$i];
            $vLength += $embedding_b[$i] * $embedding_b[$i];
        }
        $uLength = sqrt($uLength);
        $vLength = sqrt($vLength);
        return $dotProduct / ($uLength * $vLength);
    }
1 Like

If anyone is interested I found an article about Binary Passage Retriever (BPR) to represent the passage index using compact binary codes rather than continuous vectors:

And this a brilliant implementation for MySQL and PHP: