Creating Chatbot with Assistant and Table of Data

kenny1919 · March 15, 2024, 5:42pm

Hello,

We are trying to create a chatbot for our users that allows them to ask questions about and query data from a table. The table is in our Xano database, but we can export it as a CSV. It’s roughly 10,000 rows. We’re trying to figure out the most reliable way to do this. As of now, we’re using the code interpreter tool, passing the csv to the assistant, and asking it to answer questions. It’s having trouble finding the csv file sometimes, and the results have been inconsistent. Is the Retrieval tool what we actually want to be using for this? And if so, what format do I need to get the data? Because I know the Retrieval tool apparently does not support CSVs.

Any general advice on the project is appreciated. Thanks!

Macha · March 15, 2024, 9:10pm

Hello there!

So, after taking a peek at Xano, it seems to be this all-in-one package thing that uses a postgresql db as the backend (so a relational database).

The thing that gets tricky with your method is that isn’t not going to scale well. Sure, you have 10000 rows now, but the larger it gets, the more the context window is going to get stuffed.

What exactly is the data inside the database meant to represent? What is inside a single “row”? Now, I’m not as familiar with postgres, but iirc you can assign categories and tags to bundles of data, no? I think the big thing would be vectorizing semantic chunks of data, and then turn those chunks into something that’s easier for the LLM to interpret (like a json file maybe?). Currently, you’re dumping the whole db at once for context. What it really needs is the ability to pluck only the necessary parts of the db that you need. This is where embeddings can really come in handy.

jlvanhulst · March 15, 2024, 10:24pm

check out Pongo - its super easy to sync your database (and keep it up to date) and write a simple function for your Assistant to call to find the answer.

kenny1919 · March 18, 2024, 4:48pm

Each row of data represents a piece of IP (a patent), and then the columns are data about the patent (title, author(s), summary, university, date filed, etc.)

But yes, I’ve read that CSVs are not the easiest thing for an LLM to work with. I guess I’m just not really sure how to go about “vectorizing” the data.

jlvanhulst · March 18, 2024, 8:01pm

Products like pongo.ai take care of that and you only feed it the query and get results.
I use it for our Salesforce account database - I feed it with a ‘txt’ version of all text fields (description, product, team etc). and then we can do a ‘fuzzy’ search very easily. They just added re-ranking as well - which optimizes the result order.

omerk203 · March 18, 2024, 9:18pm

I was doing a similar project. One way I did it was making a sql schema and passing it to the LLM:

This is a great explanation:

kenny1919 · April 2, 2024, 2:32pm

hey sorry I’m just getting around to checking this out. I actually just booked a call with them for later today. Is this pretty simple to implement? I have a bit of a technical background but not a ton and primarily have been using low-code tools like Xano and Webflow to build everything up to this point

f.otto · May 12, 2024, 9:53am

Hi @kenny1919, I also use Xano and other Low-Code tools and would be interested in how your call went and how you solved your issue?

Best regards
Francisco

Topic		Replies	Views
Private Chat with CSV data API	17	15218	April 28, 2024
How to teach a model relational data? API assistants	10	971	July 17, 2024
Best approach to Q&A a SQL table (post dev conf)? API	1	811	November 11, 2023
Connecting an assistant to a database for retrieval API	13	9943	April 29, 2024
API and files - Assistant or classic chat completation? API gpt-4 , chatgpt , api , assistants-api	12	1602	February 20, 2024

Creating Chatbot with Assistant and Table of Data

Related topics