Big CSV files restructuring/transformation using GPT

Hey everyone, need some suggestions on the following:

I want GPT to learn a set rules based on which a csv/xlsx file must be reorganised/restructured. Feeding the entire file with 100k rows or millions of rows costs a lot of tokens and isn’t practical.

I think I can finetune GPT to learn the rules and use pandas to a certain extent to execute what GPT tells it to do. A good example of the kind of restructuring i want GPT+pandas to be able to do would be:
Rename column names, identify wrong values(empty cells, junk values or poorly formatted cells) and wrong formats(numbers where cells require a word/words) etc.

But then, what’s the best way to have GPT do a verification at the end to ensure the cleansed data is well formatted without consuming tonnes of tokens?

This sounds like a job for a good ole’ script. The AI solution sounds tough and expensive. What part needs AI in your opinion?

1 Like

I don’t think GPT is the right tool for the job in this case. Yes, you can use GPT on individual records, like “Please correct typos in the following cell” but all the other stuff you need to do will work way better, faster and cheaper with using traditional code.

1 Like