Hey everyone, need some suggestions on the following:
I want GPT to learn a set rules based on which a csv/xlsx file must be reorganised/restructured. Feeding the entire file with 100k rows or millions of rows costs a lot of tokens and isn’t practical.
I think I can finetune GPT to learn the rules and use pandas to a certain extent to execute what GPT tells it to do. A good example of the kind of restructuring i want GPT+pandas to be able to do would be:
Rename column names, identify wrong values(empty cells, junk values or poorly formatted cells) and wrong formats(numbers where cells require a word/words) etc.
But then, what’s the best way to have GPT do a verification at the end to ensure the cleansed data is well formatted without consuming tonnes of tokens?