I have used ada to embed a few documents using the following code:
from openai.embeddings_utils import get_embedding
df = pd.read_csv("sampleText.csv")
df['curie_search'] = df["Text"].apply(lambda x : get_embedding(x, engine = 'Ada_Embedding'))
Upon looking at the output document - everything seems to have a curie_search value that runs to ~34,000 characters in length.
It doesn’t seem to matter how long the text in the document is, it is 34,000 characters.
Which I thought was surprising.
I am also getting surprising results when GPT looks at a document to provide information, but provides information that is clearly not in that document, and does so consistently with the same onerous information - which I find a bit strange.
Is there something wrong with that code above? I tried using the code from the site - but that simply wouldn’t create an output for me.