Hi there,
I am planning to build a data cleaning assistant that takes dirty user data, compares it to the reference, detects some problems and transforms the data returning a clean snippet/file. There are specific steps that the LLM has to go through and perform like language detection, translation of specific columns, mapping of some values in some other columns etc. Those steps I would like to execute one by one instead of in one go. The user data is in CSV or JSON format and this is the format I would also like to have as an output.
Initially, I wanted to use Assistant API for its thread functionality (and because I have created customized GPT via UI initially to see if it worked for my task, assistant API corresponds to that functionality as I understood). Now Im questioning if I really need it and would appreciate any opinions. It is possible to stick with chat completion API I guess but what great stuff Id be missing? Would the threads/runs be beneficial in this case?
I am also worried that processing the whole file (let’s say it has 10k rows) would be overwhelming for a language model and it might have problems of missing some rows or values/columns. Any experience with that? Would Assistant API having those runs and threads (which Im thinking of as a state) be of any aid?