Hey OpenAI Community! I’ve come across an intriguing challenge and could use some guidance.
Imagine you have:
- A comprehensive 800-page PDF guide detailing every aspect of repairing, tweaking, and enhancing a specific Mercedes model. This isn’t just your regular guide; it delves deep into the nuances of the art.
- An expansive dataset containing product specs: torque, speed, HP, weight, and spare parts information.
Here’s the scenario: Using “mechanic” as an example, I want to leverage these resources to enable an LLM to give me detailed suggestions. For instance, if I ask, “I want to increase the HP of my 190e from 1996. What modern parts would be suitable?” I’d want precise recommendations based on the dataset and insights from the guide.
Given the massive size of the two resources that exceed standard token limits, direct integration during LLM training seems impractical. Has anyone here tackled a similar challenge? How can I best utilize these datasets with LLMs to extract actionable insights?
I’m thinking RAG (Retrieval-Augmented-Generation).
For more details see
After scratching the surface of RAG, it seems like a good way forward. But I feel a token limit looming when the retriever needs to look through all the data. If so, could there be a solution to this?
Yeah, that’s too big to feed to the LLM in one big prompt!
The architecture to prevent issues with that is roughly along these lines:
- upload the pdf(s) into your store
- using a suitable library, read the pdf(s), chunking it into suitable sizes
- for each chunk generate an indexed row in a table which will point you to which pdf and a deep link to the chunk. better yet store the entire chunk of text locally in the table.
- retrieve and store embeddings for each of those indexed chunks
some moments later:
- user enters query
- retrieve an embedding for the query
- using vector search, find the best match rows which will point to the best chunks of text
- return those text chunks to the user and/or deep links to pdfs.
A more advanced solution is to build a chatbot to wrap all that in natural language.
- you can take the top 3-5 best results, and inject the source text into the bot’s prompt, allowing it to respond to the user with the best source information.
This is how chatbot agents work with source material.
It completely avoids having to feed the entire source to the LLM at the time of the query.
I’ve recently built one that works with Discourse Posts as source material, but pdf differ really only in the interface: GitHub - merefield/discourse-chatbot: A cloud chatbot adaptor for Discourse, currently supporting OpenAI. As a matter of fact, I’ve been thinking about adding a pdf browser to it that would allow it to access, index and retrieve all the uploaded pdf’s to the site. PR welcome!
I’ll have to look in to your chatbot. Using a chatbot to “bake” the final response seems like the right way to go.
What do you think about using 2 RAGs?
Q: How do I change tires on my specific car?
One query is sent to the database containing “the mechanic knowledge” on changing tires.
One query is sent to the database containing “product knowledge” on tires, rims, bolt sizes etc.
The retrieved info is sent to the chatbot. The chatbot basically gets the prompt “based on the information above, how does the user change his tires?”
A: Jack your car up. Make sure to have your car on a stable surface. Your tires are mounted with 1" bolts, so use a lug wrench that fit. etc…
So in short, is it a good or bad idea to have two separate databases?
that’s an interesting scenario!
my chatbot uses the latest functions capability of the Open AI LLM API to request functions be run with specific arguments (locally).
The agent iterates “internal thoughts” to work through the problem and then once it has all the data and answers from the functions, responds to user.
This is a standard approach you will see in such problem solving. Mine is bespoke, but you can see standard ones in the Langchain library.
Yes, that’s more or less exactly what the last prompt to the LLM does.
The location source of the information is irrelevant, you simply might want to package these in different tools or functions and expose these interfaces to the LLM so it knows about them.
I have several interfaces to external APIs that i do not even maintain!
Yes, you could have two interfaces, one that retrieves practical guidance and the other products the person might use to achieve those goals.
There are chatGPT plugins that will do RAG from a pdf. I’d start doing a feasibility check with that:
- find a diverse handful of questions you’d imagine a user might have
- copy relevant info from your sources and paste into one shorter pdf
- load the pdf in a chatGPT pdf plugin
- experiment w prompting and your questions to see if they get answered correctly
Caveats: with a plugin you might not be able to tweak the RAG as much as needed, regarding chunking of the doc, how many chunks to add etc, but it might still give you a good sense of feasibility. Next step could be LlamaIndex or LangChain.
As merefield points out, you can also start with just the search/retrieval problem. Depending on the use case, that might already provide value, and you can add the chat later. Might need quite some tweaking for it to work nicely, and you’ll probably still want to have the source links in order to verify the answer I guess?