Seeking Assistance with GPT Turbo 4 for Data Cleaning and DataFrame Conversion

Hi, I am trying to use GPT Turbo 4 for cleaning my 1 million rows. My data has around 5 columns, so just to be on the safer side, I don’t want to exhaust tokens. I plan to process 300 rows at a time in a batch.

My goal is to clean my data, addressing issues such as spelling mistakes and invalid characters. I am considering sending these 300 rows in a dictionary format to make it easier for me to create a dataframe later on.

However, I’m facing a challenge where each batch output I receive has different formats – some come in JSON, some with commas, etc. I have also attempted to modify the prompt, but I’ve had no luck. Now, since each batch output is in a different format, I am struggling to append them all.

My question is: Can I get the output of 300 rows in a way that it can be easily converted to a dataframe and lateron i could append all these batches into one dataframe