Reading structured csv files (containing survey data with a huge amount of text) using API

vivekvaibhavroy · November 3, 2024, 6:30pm

I am building an app where i am using survey data to ask questions from. the survey data is very structured but has tonnes of textual data. like there is a column where its asked “What are the active areas that you are reducing spending?” (a column) and the responses (each row is a respondent’s input) can have values like, grocieris, OTT etc. (separated by “|”)

This has resulted in big file (10k rows, 50 columns, lots of textdata within the rows)

I have created a function which identifies what is the question type - i.e. “general” or “mathematical”. if it is mathematical, I am using Langchain’s dataframe agent, I am getting correct answers of count, sum etc (basically mathematical operations) but when I query something like “what is the general mood of consumer?” (basically ‘general’ questions), token limit error reaches.

I have tried a number of ways, converting my data to text and then retreiving the info, built a simple RAG model, using openai to first tell me what columns would be most relevant to get the answer and creating a filtered df on which the query runs - but nothing seems to work on 10k data however, a subset of data of ~500 rows works pretty neatly.

can anyone guide me please on how to do this?

anon10827405 · November 3, 2024, 6:32pm

I don’t have enough experience in LangChain to help in that area but in general splitting the data into more digestible sections and then synthesizing the results can help avoid hitting token limits.

MARK0 · November 3, 2024, 6:45pm

I would first try to assure that everything the LangChain agent put and get from the LLM is within the token limit.

Topic		Replies	Views
How to analyze big CSV files for a chat bot? API chatgpt , api , development	1	3242	March 19, 2024
Working with CSVs/Tables/Financial Models API	2	1892	December 19, 2023
Efficient way for Chunking CSV Files or Structured Data API	9	4144	September 5, 2024
App architecture --> how to send large dataser for analysis (exceeding token limit) API	8	9080	December 17, 2023
CSV File How to best parse larger data? API	12	3427	December 19, 2023

Reading structured csv files (containing survey data with a huge amount of text) using API

Related topics