I want to known about Text Search using text-embedding-ada-002?

elyaali10 · February 23, 2023, 6:40pm

HI every one Hope you all doing good.

I just saw the use of text embedding-ada-002 model for text search and preparation of data.

But the whole point of my question is about preparation of data for example

we have seen in example in mentioned link as we have text column and summary column then we combine both column in dictionary format like title: summary , content: text ok i got that i must have summary for every text but

if i have documents how i have to prepare dataset because summarizing each document is difficult so please tell me how to tackle this problem

Should i use directly document text content for embedding or i must have to follow the same structure for creating imbedding’s?
How to extract documents embedding as i known preparation of summary is difficult for each document and then formatting document text content and summary to the structure same like given in example will be difficult?
Please give me a proper wy to solve this use case of semantic text search by the help of text-embedding-ada-002

anon10827405 · February 23, 2023, 7:19pm

Ideally each embedding would have enough semantic uniqueness. The use cases should be used as inspiration and idealization, not step-by-step guides for your purpose unless it perfectly reflects your goal. The structure ultimately depends on the purpose of the text. You are the director of your documents, what separates them? What parts are important? What benefits does a semantic search have compared to using any other sort of search engine?

The structure should create unique documents that highlight their semantic differences. Typically this includes everything
You don’t extract documents embeddings. You are condensing/converting your documents into a comparable format for computers. You don’t need to summarize it.
The proper way is to first understand embeddings and ask yourself: is this the best solution? If you understand embeddings well, you will see that it’s a very straightforward process and your questions will answer themselves

elyaali10 · February 24, 2023, 5:39am

Thank you for your reply. ok one more question.

If i use each document text in a row of CSV file and encode to embedding after i search string query in these embedding. the output result will give me the whole document text or the semantic sentences from that document?
we are trying to create application where we will upload a text file and in the back end it will auto-encode to embedding then we will search for queries in those embedding. will that approach work?
Or
we must create embeddings of hundred of documents then the search will be greater?

Topic		Replies	Views
Converting PDF Files Text into Embeddings API	4	39437	December 18, 2023
Embedding and searching from similar embeddings API	6	6610	October 27, 2023
Embedding Longer Texts API	8	14985	December 25, 2023
Searching Using Vectors Derived from Long Text Segments in an Embedding Model API embeddings , api	4	2394	December 15, 2023
Is there any information about how to train "text-embedding-ada-002" model? Documentation embeddings	2	5485	August 3, 2023

I want to known about Text Search using text-embedding-ada-002?

But the whole point of my question is about preparation of data for example

Related topics