RAG needs more detail source information

,

As I read in document, now assistant chatbot have retrival system. I’ve tried to use them, but they didn’t find which file and which part they referenced. I tested this gpt-4-turbo and new gpt-3.5 both of them didn’t know which file they used. What I expected was, file name and file page. I hope there is setup option how to split the corpus, and metadata.
image

3 Likes

Great question. To include metadata like file names and page numbers, you can format system messages or custom prompts with specific instructions for the model. This might involve defining placeholders within the prompt that tell the model to look for and include this metadata in its response. For example, a system message could be:

“Retrieve the following details: [file name], [page number].”

In the context of function calls, these are a feature of the OpenAI API that allows you to execute a predefined function as part of the model’s response. You can create a function that, when called, accesses the document’s metadata and returns the file name and page number alongside the response. The documentation provides information on how to structure these calls and integrate them into your application.

It’s important to note that function calls would need to be tailored to your specific data structure and retrieval needs. They would also require the metadata to be structured and accessible in a way that the function can reliably retrieve it.

Here is a generic example of how a system message with placeholders for metadata might look:

{
“system”: “fetch_document_details”,
“data”: {
“file_name_placeholder”: “{file_name}”,
“page_number_placeholder”: “{page_number}”
}
}

For a function call, you might define a function in your application that accesses and retrieves file metadata. When the chatbot needs to provide this information, it would issue a function call within the prompt. Here’s a simplified pseudo-code example:

def get_file_metadata(document_id):
# Placeholder function to get file metadata based on the document ID
metadata = database.get_document_metadata(document_id)
return {
“file_name”: metadata.file_name,
“page_number”: metadata.page_number
}

Example of calling the function in the chatbot prompt

metadata = get_file_metadata(‘doc123’)
response = chatbot.prompt(f"File name: {metadata[‘file_name’]}, Page: {metadata[‘page_number’]}")

This is a high-level example and would need to be adjusted for the specific programming language and data access methods that you are using.

I hope this helps. All the best

2 Likes

Oh cool, Where can I find more about RAG setup information?

Happy it helped. The OpenAI documentation does not go into RAG configuration parameters for developers…yet (cross my fingers) so I believe that for now this is all handled by OpenAI. The OpenAI documentation is really good and will get better as more features, projects and use cases roll out OpenAI Platform


Thanks for your kind explain, however it doesn’t work when I put like this. It doesn’t give me source information that I wanted.
And anyone have information how openai price when they retrieve, I couldn’t find any information how they use RAG, exactly what prompt was inputted even I checked run step result.

I did do some further checking and RAG (Called Retrieval) setup is in fact handled automatically after you select retrieval and upload your files to OpenAI if you are building and assistant in the OpenAI playground type UI.

If you are buiding a GPT in ChatGPT Plus the setup is similar with regard to selecting the Retrieval (RAG) check box. You can also use the Tool Retrieval code if you are hard coding your application. I have attached a screenshot of how to set it up in OpenAi as well as a link to the code. All the best with your application! OpenAi Tools Retrieval|397x203
https://platform.openai.com/docs/assistants/tools/knowledge-retrieval

Click logs in the upper right corner to open the log window and there will be many detailed execution steps.It depends on whether it has been completed, and it seems that the running results must be seen in the return results on the right?