Questions about a Q&A bot and embeddings

elindo586 · October 18, 2023, 12:35pm

Somebody asked me why am I using so many questions in one file to embed, rather than embedding a single question by itself? – now my brain blew up a little.

The back story is that I had a Langchain code working embedding about 20 questions in chunks… and the bot seem to answer in natural language given the Q&A file provided in vectors… <<<< — with this said, what would be the benefit to embed one question at a time? and not the answer?.. when and at what point would I need to limit the amount of questions in one file?

Now, Pinecone can have metadata and I can see a case to embed different files with different categories… but one question at a time?.. eventually the bot well need all of the information…

curt.kennedy · October 18, 2023, 3:57pm

Usually you would embed individual questions, individual answers, or both, and the search will aggregate these all together to form the top K hits related to the query.

If you embed all 20 questions and get only 1 embedding vector, your search precision suffers because you are listing all 20 things under 1 vector, and so your overall retrieval may suffer.

I say may because if the 20 things are all related, then it might be more efficient to try and group all these similar things and let the LLM sort out the rest.

But if the 20 things aren’t related to a specific query, then you are just adding noise to your search.

hbj418 · March 9, 2024, 1:18pm

Your answer help me a lot, for I have qa pais separately and the other pairs relarively as well.
According to the individual questions/answers, If I want to save the one pair in one document of ChromaDB, should I use the QA Pattern of Json file or just use the plain text.

Topic		Replies	Views
Questions about embeddings API	1	1291	October 16, 2023
Preparing the dataset for embeddings API	10	5841	December 17, 2023
Embedding Longer Texts API	8	14025	December 25, 2023
Questions about the embedding-based chatbot API embedding	5	68	December 15, 2024
Embedding - text length vs accuracy? API	13	14283	December 25, 2023

Questions about a Q&A bot and embeddings

Related topics