How to use text embedding ada 002 with a dataset that contains four hundred rows

dboukali98 · June 23, 2023, 10:45pm

Hi ,
I have a dataset that contains about four hundred rows , I want to use text embedding ada 002 model api to generate embeddings for some specific columns can somebody please give me guidance on how to handle this large amount of data and feed it to the model through api because I know there are rate limits and other obstacles .
Thanks in advance

jwatte · June 24, 2023, 2:11am

400 rows is not a large amount of data.
While there are rate limits, they are quite high for ADA if you us batching.
Currently, I batch 500 text snippets per request to the API to generate embeddings for text snippets. For you, it could be similar, but you’d send column values instead of text snippets.
Anyway – let’s say you have 5 columns per row, and 400 rows. If you want to send 500 items per batch, you’d get (5*400)/500 == 4 batches.

Separately: What’s in these columns? Embeddings are useful for semantic closeness matching, but aren’t good for more traditional database queries like substrings or value range matches.

dboukali98 · June 24, 2023, 11:07am

Thanks for your reply ,
I am building a job recommendation system , ai need to get embeddings for job descriptions and job requirements , they are two columns in the dataset .

Topic		Replies	Views
I want to known about Text Search using text-embedding-ada-002? API	3	1018	December 20, 2023
Help needed with embedding recommendations and multiple databases API	2	705	January 21, 2023
Searching Using Vectors Derived from Long Text Segments in an Embedding Model API embeddings , api	4	2127	December 15, 2023
Creating embeddings for large text file from MongoDb API	2	812	April 2, 2024
Embedding Longer Texts API	8	13742	December 25, 2023

How to use text embedding ada 002 with a dataset that contains four hundred rows

Related topics