Efficient way for Chunking CSV Files or Structured Data

sergeliatko · July 23, 2024, 8:57pm

Personally I prefer to structure my thinking backwards from the final goal:

What is the best way to organize the data storage for the models generated in step #6
How to process the available data to produce the storable items described in #7 (workflow)
Where do I get the data to feed in step #8 (sources + way to find the data)
Where are the weak points in the system above and how to improve them.

Hope that helps

As weak points (on the spot):

numbers processing by LLM is probably to replace by regular programming
double check your data models in RAG engine so that they make sense for retrieval operations and do not stay inside the thinking box of the domain you’re working with
often it is better to pull more data out of vector DB and pass through the “data quality filter” before selecting items to stuff into your prompts
ideally, retrieved data item should not need post processing to be inserted in prompt (so it’s more your prompt that will decide how you store the data) because you data-mine once and search for it all the time
use classic code whenever possible as LLM is not the exact science and errors fly all over, the best approach on the long run is to use LLM as a tool to allow classic code access easily the semantics( /si’mantiks/ - the meaning as you hear it™ - will be my new brand for my AI tools) of your data to be able to use solid logic to process it
break LLM tasks as much as you can to simplify them and be able to use short prompts on cheap models
log your operations with input/output from the start to gather the training data for fine-tuning in case you need it later on

Topic		Replies	Views
Private Chat with CSV data API	17	16619	April 28, 2024
How to analyze big CSV files for a chat bot? API chatgpt , api , development	1	3505	March 19, 2024
How to best structure CSV embeddings to elicit clear and correct answers from Prompting gpt-35-turbo	3	6305	July 24, 2023
How to process structured data? API structured-output	10	516	July 1, 2025
Send CSV file for use in Chat Completion? API	19	25709	December 13, 2023