Is there any way I could use the Completion API to read and answer questions about a large CSV?
As a test, I’ve been able to pass it a small portion of the CSV (in text format) and it successfully answers questions about the data in the CSV. However, because the Completion API does not remember any of the previous prompt, I can’t give it the complete CSV file.
I’m using the Completion API to help me create thousands of prompt/completion pairs for fine tuning based on the data in my CSV.
As an example, this CSV contains internal company data about products and I’m using davinci3 to create prompt/completion pairs so I can fine-tune/train the model on this product data.
Hello. @squitorio did you or anyone find a way to process a large CSV file and generate a prompt/completion pair file for this? I’m in a similar use case as the one that you mentioned. Thanks in advance!
Language models require context and you are asking about language models.
This means that any process of taking a CSV file and using that file to fine-tune a model requires context and there is not “one size” fits all.
If you want an answer about fine-tuning a model, please provide a few lines of your CSV file you wish to convert to prompt-completion key-value pairs and the question you want to ask the model to get a reply based on your sample CSV data you will provide.
Lets say I have a CSV file with my company’s employees with the following attributes: name, date of birth, country, address, date of entry to the company, salary, total vacation days per year, total vacation days took until this time of the year, revenue generated from sales, etc. And I have a large CSV with this information. Is there any way/tool/whatever that I upload the content of the CSV and get a file with prompt-completion key-value pairs? The are a hundred examples of questions I can ask: all the employees names starting with A, all the employees that his/her birthday is in march, all the employees that run out of vacation days for this year, top 3 employees by the revenue generated from sales, and a large etc.
I’m working on ideas for processing large inputs. As a general rule, I’m not sure that you often need the entire input loaded in one process attempt
Take your example of all employees with names beginning with A. That question could be asked on individual rows and aggregated at the end. You’d do it on 1000 rows simultaneously to make it faster. But the fact is you don’t need to do it on all the rows at once.