How to use RAG properly and what types of query it is good at?

Hello!

I am doing a comparison between ChatGPT builder and traditional RAG with OpenAI API.

For ChatGPT builder, I don’t think there are that many technical things for me to config properly, just follow the instructions and you are done.

For RAG, I am using LlamaIndex with OpenAI API high-end models. I have changed the “chunk_size”, “chunk_overlap”, and “similarity_top_k” parameters several times to see the change in the outcomes.

However, both of the models had poor results when I tried to ask some questions about comparing and finding the differences between files related to graduation requirements for six different universities.

Here are my questions:

  1. Is ChatGPT builder actually using RAG in the background for uploaded files? I tried to search for the documents and blogs but failed to find the answer.

  2. For my case, how should I config RAG parameters properly to get a good outcome?

  3. Since RAG is good at finding something similar to the query, does it mean that RAG is bad at dealing with questions related to “comparing”?

I am totally new to the forum and the technique, hope I made myself understood, and thanks a lot for helping me!

Welcome to the community!

Well, rag can get pretty involved, depending on your use-case. Llamaindex even has blog post on it: Building Performant RAG Applications for Production - LlamaIndex 🦙 v0.10.13

TL;DR: tweaking these three parameters probably isn’t gonna get you very far, especially if you threw all files into the same index.

OpenAI runs an agentic approach: while RAG might be part of the process, they (probably) don’t throw all of your documents into a pot and stir.

Things you can do:

  • allow the model to select which documents should be searched
  • tell the model how the search function should be invoked
    • Try keywords?
    • Hypothetical Document to be Embedded for Search? (HyDE)?
  • if the query requires a comparison between multiple documents, allow the model to perform separate/parallel refinement operations, before combining the results for comparison

Embedding based Retrieval itself is only really good at comparing the similarity between concepts. But the retrieval step can be as involved as you want it to be.

Augmentation just means that you take those retrievals and put them into the prompt before Generation.

Do you think this is a good starting point you can jump off of?

1 Like

Thank you so much for your detailed answer. Though I don’t fully understand all of your answers yet, I will definitely try them out one by one.

I will consider all of the concepts and methods you talked about and throw/update some of my points soon(out of town for a few days).

Thanks again!

1 Like

We don’t know. But it should be something similar to a RAG.

Start with a small data set, test it.

For me, chunk_size is usually set to about 10 sentences + 5 sentences on either side (chunk overlap)
Top_k is usually set to around 20

No. RAG just gets your results back. So in the above case, 20 top_k results.

You then need to do a post processing using your high end GPT model to get the final result.

Thanks for your answer!

So, there is no way we could find somewhere if OpenAI is actually using RAG in the backend or not.

I’ve tried small(10K~4M) and big(100M) datasets with different parameters but got poor performance.

I used OpenAI high-end models(gpt-4-0125-preview and text-embedding-3-small), but still got poor quality of answer.

The information you provided focuses on the RAG pipeline, correct?

As I’m comparing ChatGPT builder with RAG, I’m curious if there’s any potential improvement or an existing plugin that could offer a more robust comparison or analysis of the documents I’ve provided?

Thanks

Unfortunately not.

First thing to check is if your TopK is returning the correct chunks. In this case, it seems like its not. See, if the TopK bought the right responses back, then all GPT4 does is to rewrite them. Have you tested that?

I attempted to adjust the similarity_top_k parameter to access different chunks but ended up with identical information each time. Disappointingly, even after modifying the chunk_size and chunk_overlap values, the retrieved chunks remained unchanged.