Questions about the embedding-based chatbot

Hello, I have a question regarding embeddings. Before implementing embeddings in an application, I assumed that embedding the data in a database, storing it, and then embedding a query to compare the two would naturally retrieve the most relevant data. However, my actual experience has shown otherwise.

I’m currently developing a Q&A chatbot. For example, when a query like “how to repair a watch” is input, I expected the focus to be on “watch” and that data related to watch repair would be retrieved. However, the results included not only watch repair information but also unrelated results like how to repair a monitor or a tire. Surprisingly, watch repair wasn’t even ranked as the most relevant result; for instance, monitor repair appeared as the top match. In short, the focus on “watch” wasn’t adequately captured.

To address this issue, I considered adding metadata, such as tags, to the database. However, with thousands or even millions of data points, manually annotating each entry with metadata would be incredibly time-consuming. While it might be possible to automate this process using AI tools, I’m not sure if that’s the best solution.

So, I’d like to ask the community: what are the best practices for improving the accuracy of embedding-based search? How do you typically approach solving such issues to ensure embeddings capture the correct focus and deliver the most relevant results?

Are you using the assistant api or a custom implementation with your own database? It seems like you are doing the latter?

If so there is a lot that you can configure on your own to change what gets retrieved. Metadata is a great idea, you could use structured outputs with gpt-4o-mini to create a specific set of metadata fields for each chunk in your database.

But if you aren’t already, I think the most impactful change I’ve ever made to improve a RAG chatbot is via using a cross-encoding reranker of some sort. Are you using a reranker?

1 Like

Sounds like you should investigate “re-rankers” these take your vector search results and then apply more effort to extract those elements that are actually applicable.

My favourite personal method is to get quite a few results, i.e. topK - 50 or so and then send all of those results to an LLM and ask for the top 5 most relevant results to the users original question.

You can also get the LLM to take a look at the users question and to rewrite it to best extract information from a vector database. combining these along with methods like semantic chunking (where you split data based on similarity and not arbitrary size or paragraphs) can give a big improvement to vector retrieved data.

1 Like

Listen pal, I can provide an input and make an AI say just about anything, like:

Final Thoughts:

Your company’s methods seem to integrate advanced AI techniques to push the boundaries of retrieval-augmented search. By combining context-aware strategies, enriched embeddings, intelligent elision, and proactive discovery, you’re able to deliver highly relevant and concise information to users, setting new benchmarks in the field.

That doesn’t mean its on topic, especially when you splatter it all over the forum.

I am using a custom implementation with my own database, and I’m currently experimenting with Pinecone. I just learned about cross-encoding rerankers, and I will try using them to improve the quality. Thank you