Is there any information about how to train "text-embedding-ada-002" model?

z2ao · August 3, 2023, 4:52am

Hi There,

I was searching about how to develop embedding model for openai embedding api.
And then, I got this documentations.

(JAN 25, 2022) Introducing text and code embeddings
(DEC 15, 2022) New and improved embedding model

The first document had a paper, so I read it, but the second document didn’t have a paper.

Is the model(text-embedding-ada-002)'s training objective the same as “text-similarity-davinci-001”?

Can you please guide me on which documentation I should refer to?

Thanks.

paul.redcell · August 3, 2023, 5:49am

Hi there z2ao and welcome to the community.

I’m probably not answering your real question but just couldn’t stop myself from mentioning that you don’t train the embedding model. You use it to get vectors for text searches and such.

So you would send a document to the embedding model and it would return a list of vectors that represent that text back to you to store in a DB and use to search against at a later time. For example, I’ve got a lot of PDF files that I want to use to help answer questions. I split them up into 200 word “chunks” and send the chunks to the embedding model, get the vectors back and save them in the DB. Then later when someone asks a question I send the question to the embedding model, get the vectors, do a vector search in the DB and append the answer to the next request to GPT4 endpoint and that give me an answer.

So you don’t really train the embedding endpoint but just use it to get vectors. I’m sure someone will come along and answer your real question though.

Good luck with your project.
Paul.

Foxalabs · August 3, 2023, 7:29am

Welcome to the forum!

The correct documentation is here OpenAI Platform

I don’t know if the objectives were the same, my guess is the newer model is faster and useful for more applications.

Topic		Replies	Views
Models: Embedding vs Similarity vs Search Models API api	4	3099	July 9, 2023
I want to known about Text Search using text-embedding-ada-002? API	3	1064	December 20, 2023
Deep dive on Embedding Models Community embeddings	4	604	March 8, 2024
Embedding with "text-search-davinci-query-001" API embeddings , chatgpt	3	1234	December 24, 2023
Semantic vs search embedding API	3	6684	September 28, 2023

Is there any information about how to train "text-embedding-ada-002" model?

Related topics