Private Chat with CSV data

I’m sure you’re aware that an LLM is not a match for basic query language (SQL, et al) and for that reason, I suspect you are involving the LLM because you have a need for semantic language/NLP interface.

In other words, you’d like a system that allows a user to use natural language to retrieve data and this is the reason you’re involving an LLM.

If this is true, you’re in luck, because there is a simple solution that requires very few tokens, and is far quicker/more accurate.

To approach this, understand what each part of the equation does best. The LLM is used to understand the user’s request, while structured query language (not necessarily “SQL”) is used recover the data.

For simplicity of this example, I’ll use SQL and provide you with a step-by-step to recreate the functionality you desire. My approach is different from @_j above – his eliminates the LLM where I’m combining the two:

  1. Import your CSV into a simple/quick datastore that can be queried based on dynamically created query language. The key being that it’s intended to interpret dynamic commands, which is why SQL is a good match. For this demo, I’d go with something ultra-basic/quick/easy like MySQL
  2. Create an LLM prompt that explains the CSV that’s stored in MySQL (the table, the purpose of the columns, etc)
  3. Write your own UI (or re-use one of the open-source UI’s available) to create a customer-facing chat-bot (much easier than it sounds, can be done in less than 50 lines of code and created by GPT).
  4. Behind the scenes, every prompt will include your explanation of the data, asking the LLM to create SQL that produces the records requested by the user.
  5. The LLM will return a SQL statement which you’ll strip out. Return the LLM’s text answer (minus the SQL stripped out), along with the recordset returned by SQL.

In short, you have a perfect blend of LLM, procedural code (your script), and set theory (SQL/etc). Each form of technology/script doing what its best at!

And in turn … You have a highly reliable piece of code that runs much less expensively, and far more accurately, than any other approach. Win-Win :slight_smile:

4 Likes