How to analyze big CSV files for a chat bot?

user1717 · March 6, 2024, 3:12pm

Hi, I’m currently interested in developing a chat bot that lets a user ask a question about a dataset. Let’s imagine the dataset (CSV) is sales history for a company with several descriptive columns like store-id, date, item-title, and so on. An example prompt would be:

What were the worst selling products in September?

Some problems I’ve discovered while planning/researching how to do this:

Files are too large, which makes it difficult to directly feed it because of restricted token usage.
Embedding with the use of vector databases are hard because the models are good at resonating on text (e.g. articles, blog posts, etc) and not necessarily any good at giving answers for CSV files that are vectorized.

What I’m thinking of right now is to code an application that:

Extracts the column names
Captures the user input (e.g. question above)
Combines them into a prompt which is sent to the OpenAI API.

You are to generate NumPy code based on the following columns from a CSV file:

<columns>

and the following prompt:

<prompt>

Run the returned NumPy code on the CSV file.
Return the output to the user.

What do you guys think? At this point in time there seems to be that the technology just isn’t there based on my research.

Another idea I just got is to transform the CSV rows into “human-readable” sentences and then vectorize that. Is that possible? Would that make the querying yield better results?

alexanderkuruvilla · March 19, 2024, 5:21am

Hi, I have been trying to work on something similar. Similar to what you have experienced, the performance of vectorised embeddings using csv data is quite horrible. I have although come across csv agents of langchain that can utilise openai to answer question with fairly decent accuracy. The only caveat being that the agent is still at an experimental stage and can only answer questions directly. It can’t get creative and say, generate a trivia based on the data. Were you able to find any other solution? It seems to me the only way would be to convert the csv data to text documents, which seems cumbersome.

Topic		Replies	Views
Analysing Big Data (CSV) via OpenAI API API gpt-4 , plugin-development , api	13	21897	February 2, 2024
Private Chat with CSV data API	17	15738	April 28, 2024
Querying a CSV with the help of Chatgpt API API	1	4412	February 13, 2024
Gpt chatbot with delimited data from excel API	2	2338	December 17, 2023
Reading structured csv files (containing survey data with a huge amount of text) using API API	2	207	November 3, 2024

How to analyze big CSV files for a chat bot?

Related topics