Some questions about text-embedding-ada-002’s embedding

Ah yeah, I have a bit of a unusual setup as it is in a windows app.

I get the embeddings for all the sentences from the ada-002 api. (the colored line chart). These are from the API directly.

I averaged them all by parameter, leaving a 1536 array (the blue line chart)

Then I go through each pair, subtract that average value from each corresponding value, leaving a modified array for the left and right sentences.

Last I calculate the cosine similarity between these new arrays. I’m using a framework called Accord to calculate that, but I assume it is the same.

I’ve tried French, numbers, html but the shape seems pretty persistent :slight_smile:

Also full disclosure, it’s not impossible I’m doing something wrong - day one with this api.