Question on Embedding - Embedding Length is uniform?

dravinian · April 11, 2023, 1:37pm

I have used ada to embed a few documents using the following code:

from openai.embeddings_utils import get_embedding

df = pd.read_csv("sampleText.csv")
df['curie_search'] = df["Text"].apply(lambda x : get_embedding(x, engine = 'Ada_Embedding'))

Upon looking at the output document - everything seems to have a curie_search value that runs to ~34,000 characters in length.

It doesn’t seem to matter how long the text in the document is, it is 34,000 characters.

Which I thought was surprising.

I am also getting surprising results when GPT looks at a document to provide information, but provides information that is clearly not in that document, and does so consistently with the same onerous information - which I find a bit strange.

Is there something wrong with that code above? I tried using the code from the site - but that simply wouldn’t create an output for me.

joyasree78 · April 11, 2023, 3:43pm

Can you please share how you are giving the prompt to get the answer. Also, if possible share code snippet of your question/answer retrieval part

dravinian · April 11, 2023, 5:11pm

I don’t think they are the issue, as it is a random issue that only seems to crop up on documents where the length of the embedding doesn’t really match the length of the document.

It is also always the same document, other documents with more substantive text are fine - so I don’t think it is the prompt or the way it is being asked.

joyasree78 · April 11, 2023, 6:03pm

Let me try to see if I can replicate it. I have a similar use case that I am trying, but I have not faced this yet

Topic		Replies	Views
Inconsistent Embedding Results for my dataset API embeddings	1	92	November 14, 2024
Why is Openai Embeddings API returning multiple vectors for one very long string? API	3	1373	December 18, 2023
Understanding "text-embedding-ada-002" vector length of 1536 API	5	22324	January 21, 2024
Problems using Embedding API API embeddings	2	2554	December 18, 2023
Embedding Longer Texts API	8	15298	December 25, 2023

Question on Embedding - Embedding Length is uniform?

Related topics