Ah yeah, I have a bit of a unusual setup as it is in a windows app.
I get the embeddings for all the sentences from the ada-002 api. (the colored line chart). These are from the API directly.
I averaged them all by parameter, leaving a 1536 array (the blue line chart)
Then I go through each pair, subtract that average value from each corresponding value, leaving a modified array for the left and right sentences.
Last I calculate the cosine similarity between these new arrays. I’m using a framework called Accord to calculate that, but I assume it is the same.
I’ve tried French, numbers, html but the shape seems pretty persistent
Also full disclosure, it’s not impossible I’m doing something wrong - day one with this api.