Working with CSVs/Tables/Financial Models

Daan.minimum · September 27, 2023, 11:36am

Hey all!
So a very quick intro. We’re building a tool where we extract certain data from excel sheets. What we want to do is basically export the excel sheets as CSV or HTML and ask gpt-4 to extract certain data and interpret it. Extracting data from a CSV/HTML element as text works pretty well, so that’s not the issue here.

Now the issue is, these sheets and tables can get pretty huge and have a lot of formatting. As you can see in the picture below, this is way too many tokens for an API call. So the question is, does anyone have experience with these kinds of solutions and is there a good way of chunking HTML/CSV data without losing too much context? Would it be something for a vector database or any other way of formatting this data? I know that these guys let you upload your own data but I haven’t fully tried it out yet, maybe someone has experience with that? https://finchat.io/

Thanks for the help!!

Foxalabs · September 27, 2023, 11:51am

Hi and welcome to the Developer Forum!

To get a finance system working like this you need to be very careful with how you vectorise the data, consider how people use these systems, very little of the important numerical data is give any attention by people other than bottom lines, or query specific requirements. Numerical data is not well suited to sematic similarity, how is 10 important to 600? there are no logical linguistic links between 10 and 600 until you understand that 10 is the quantity and 600 is the final sales price including tax… so you will need to delink the numbers from the text in the spreadsheet while maintaining logical links, perhaps with meta headers embedded with the vector chunks.

This will not be a simple matter of ingesting the spreadsheets as a CSV and it will all work flawlessly. You can certainly try that and see what the results will be like, but I imagine there will be less than optimal output.

Topic		Replies	Views
How to analyze big CSV files for a chat bot? API chatgpt , api , development	1	3243	March 19, 2024
Analysing Big Data (CSV) via OpenAI API API gpt-4 , plugin-development , api	13	21903	February 2, 2024
Reading structured csv files (containing survey data with a huge amount of text) using API API	2	207	November 3, 2024
Factual Search on CSV, Excel and Output as Table API api	4	1302	December 19, 2023
CSV File How to best parse larger data? API	12	3428	December 19, 2023

Working with CSVs/Tables/Financial Models

Related topics