Some questions about text-embedding-ada-002’s embedding

debreuil · January 25, 2023, 3:07am

Ah yeah, I have a bit of a unusual setup as it is in a windows app.

I get the embeddings for all the sentences from the ada-002 api. (the colored line chart). These are from the API directly.

I averaged them all by parameter, leaving a 1536 array (the blue line chart)

Then I go through each pair, subtract that average value from each corresponding value, leaving a modified array for the left and right sentences.

Last I calculate the cosine similarity between these new arrays. I’m using a framework called Accord to calculate that, but I assume it is the same.

I’ve tried French, numbers, html but the shape seems pretty persistent

Also full disclosure, it’s not impossible I’m doing something wrong - day one with this api.

Topic		Replies	Views
Question on text-embedding-ada-002 API	12	6128	December 24, 2023
Why `OpenAI Embedding` return different vectors for the same text input? API	35	8389	April 30, 2024
Can text-embedding-ada-002 be made deterministic? API embeddings , ada	18	6547	December 24, 2023
Embeddings and Cosine Similarity API	20	13298	February 25, 2024
It looks like 'text-embedding-3' embeddings are truncated/scaled versions from higher dim version API embeddings , tips-and-tricks	46	8511	May 26, 2024