How to use RAG properly and what types of query it is good at?

Hello!

I am doing a comparison between ChatGPT builder and traditional RAG with OpenAI API.

For ChatGPT builder, I don’t think there are that many technical things for me to config properly, just follow the instructions and you are done.

For RAG, I am using LlamaIndex with OpenAI API high-end models. I have changed the “chunk_size”, “chunk_overlap”, and “similarity_top_k” parameters several times to see the change in the outcomes.

However, both of the models had poor results when I tried to ask some questions about comparing and finding the differences between files related to graduation requirements for six different universities.

Here are my questions:

  1. Is ChatGPT builder actually using RAG in the background for uploaded files? I tried to search for the documents and blogs but failed to find the answer.

  2. For my case, how should I config RAG parameters properly to get a good outcome?

  3. Since RAG is good at finding something similar to the query, does it mean that RAG is bad at dealing with questions related to “comparing”?

I am totally new to the forum and the technique, hope I made myself understood, and thanks a lot for helping me!

Welcome to the community!

Well, rag can get pretty involved, depending on your use-case. Llamaindex even has blog post on it: Redirecting...

TL;DR: tweaking these three parameters probably isn’t gonna get you very far, especially if you threw all files into the same index.

OpenAI runs an agentic approach: while RAG might be part of the process, they (probably) don’t throw all of your documents into a pot and stir.

Things you can do:

  • allow the model to select which documents should be searched
  • tell the model how the search function should be invoked
    • Try keywords?
    • Hypothetical Document to be Embedded for Search? (HyDE)?*
  • if the query requires a comparison between multiple documents, allow the model to perform separate/parallel refinement operations, before combining the results for comparison

Embedding based Retrieval itself is only really good at comparing the similarity between concepts. But the retrieval step can be as involved as you want it to be.

Augmentation just means that you take those retrievals and put them into the prompt before Generation.

Do you think this is a good starting point you can jump off of?


*I no longer recommend HyDE (april 2024)

2 Likes

Thank you so much for your detailed answer. Though I don’t fully understand all of your answers yet, I will definitely try them out one by one.

I will consider all of the concepts and methods you talked about and throw/update some of my points soon(out of town for a few days).

Thanks again!

1 Like

We don’t know. But it should be something similar to a RAG.

Start with a small data set, test it.

For me, chunk_size is usually set to about 10 sentences + 5 sentences on either side (chunk overlap)
Top_k is usually set to around 20

No. RAG just gets your results back. So in the above case, 20 top_k results.

You then need to do a post processing using your high end GPT model to get the final result.

Thanks for your answer!

So, there is no way we could find somewhere if OpenAI is actually using RAG in the backend or not.

I’ve tried small(10K~4M) and big(100M) datasets with different parameters but got poor performance.

I used OpenAI high-end models(gpt-4-0125-preview and text-embedding-3-small), but still got poor quality of answer.

The information you provided focuses on the RAG pipeline, correct?

As I’m comparing ChatGPT builder with RAG, I’m curious if there’s any potential improvement or an existing plugin that could offer a more robust comparison or analysis of the documents I’ve provided?

Thanks

Unfortunately not.

First thing to check is if your TopK is returning the correct chunks. In this case, it seems like its not. See, if the TopK bought the right responses back, then all GPT4 does is to rewrite them. Have you tested that?

I attempted to adjust the similarity_top_k parameter to access different chunks but ended up with identical information each time. Disappointingly, even after modifying the chunk_size and chunk_overlap values, the retrieved chunks remained unchanged.

Hello!

It’s great to see you’re diving deep into the comparison between ChatGPT builder and traditional Retrieval-Augmented Generation (RAG) with the OpenAI API. Both approaches have their strengths, and your thorough exploration is commendable.

To address your queries:

Is ChatGPT builder actually using RAG in the background for uploaded files?

The ChatGPT builder is designed to streamline the process of creating conversational AI without requiring extensive technical configurations. While specific implementation details may vary, tools like Kommunicate often leverage advanced techniques, including RAG, to enhance the functionality and accuracy of their chatbot solutions. This integration allows for more efficient and contextually aware responses by combining retrieval mechanisms with generative models. However, the primary advantage of using such platforms is that they abstract away much of the complexity, allowing users to focus on building effective interactions without getting bogged down in technical details.

For my case, how should I config RAG parameters properly to get a good outcome?

Configuring RAG parameters effectively requires some experimentation, as you’ve been doing. Here are some tips to optimize the parameters for better outcomes:

  1. chunk_size: This determines the size of the text chunks. Smaller chunks can lead to more precise retrievals but may miss broader context. Larger chunks provide more context but can dilute relevance. For detailed comparisons like graduation requirements, you might start with moderate-sized chunks (e.g., 300-500 tokens).

  2. chunk_overlap: Overlapping chunks can help ensure that important information isn’t missed at chunk boundaries. A typical overlap of 50-100 tokens can be a good starting point.

  3. similarity_top_k: This parameter controls how many top similar chunks are considered. Increasing this value can improve the chances of capturing relevant information but might also introduce noise. A value between 3 and 5 often works well for detailed queries.

  4. Query Refinement: Ensure your query is as specific as possible. Adding contextual keywords can help the model focus on the most relevant chunks.

Since RAG is good at finding something similar to the query, does it mean that RAG is bad at dealing with questions related to “comparing”?

RAG excels at retrieving relevant information based on the input query, which makes it suitable for tasks involving finding specific details. However, when it comes to comparison tasks, the challenge lies in synthesizing and contrasting information from multiple sources. Here are some tips to enhance RAG’s performance for comparison tasks:

Structured Data Representation: Ensure that the data is well-structured. This can involve preprocessing your documents to highlight key points and differences explicitly.

  • Multi-step Queries: Break down the comparison into smaller, more specific queries. For example, instead of asking for a direct comparison, ask for the details of each university’s graduation requirements first, then compare the retrieved details.
  • Post-Processing: Use additional logic to process the retrieved information. This can involve summarizing and comparing key points programmatically after retrieval.

Why Kommunicate?

While traditional RAG setups offer flexibility and control, platforms like Kommunicate provide a balanced approach by integrating advanced retrieval techniques with user-friendly interfaces. This allows users to leverage powerful AI capabilities without needing deep technical expertise. Kommunicate’s tools can help streamline the process, offering robust solutions for creating and managing conversational AI, ultimately saving you time and effort while delivering high-quality outcomes.

Feel free to reach out if you have any more questions or need further assistance with your project. Good luck!

2 Likes