Is there any information about how to train "text-embedding-ada-002" model?

Hi There,

I was searching about how to develop embedding model for openai embedding api.
And then, I got this documentations.

  1. (JAN 25, 2022) Introducing text and code embeddings
  2. (DEC 15, 2022) New and improved embedding model

The first document had a paper, so I read it, but the second document didn’t have a paper.

Is the model(text-embedding-ada-002)'s training objective the same as “text-similarity-davinci-001”?

Can you please guide me on which documentation I should refer to?

Thanks.

Hi there z2ao and welcome to the community.

I’m probably not answering your real question but just couldn’t stop myself from mentioning that you don’t train the embedding model. You use it to get vectors for text searches and such.

So you would send a document to the embedding model and it would return a list of vectors that represent that text back to you to store in a DB and use to search against at a later time. For example, I’ve got a lot of PDF files that I want to use to help answer questions. I split them up into 200 word “chunks” and send the chunks to the embedding model, get the vectors back and save them in the DB. Then later when someone asks a question I send the question to the embedding model, get the vectors, do a vector search in the DB and append the answer to the next request to GPT4 endpoint and that give me an answer.

So you don’t really train the embedding endpoint but just use it to get vectors. I’m sure someone will come along and answer your real question though.

Good luck with your project.
Paul.

3 Likes

Welcome to the forum!

The correct documentation is here OpenAI Platform

I don’t know if the objectives were the same, my guess is the newer model is faster and useful for more applications.

1 Like