Using completion API to process a large CSV?

squitorio · January 20, 2023, 10:53pm

Is there any way I could use the Completion API to read and answer questions about a large CSV?

As a test, I’ve been able to pass it a small portion of the CSV (in text format) and it successfully answers questions about the data in the CSV. However, because the Completion API does not remember any of the previous prompt, I can’t give it the complete CSV file.

I’m using the Completion API to help me create thousands of prompt/completion pairs for fine tuning based on the data in my CSV.

As an example, this CSV contains internal company data about products and I’m using davinci3 to create prompt/completion pairs so I can fine-tune/train the model on this product data.

ruby_coder · January 21, 2023, 4:04am

Hi @squitorio,

You can fine-tune one step at a time, by fine-tuning your fine-tunes.

However, I have not yet confirmed this in code; but many have posted (said) this is possible.

.. stand by, confirming for you now @squitorio … fine-tuning takes time (still processing)

  "status"=>"pending",
  "fine_tuned_model"=>nil

I will post back when the status changes to "status"=>"succeeded", or whatever the next status is

Update… still pending…

  "created_at"=>1674274282,
  "updated_at"=>1674274282,
  "status"=>"pending",
  "fine_tuned_model"=>nil}]}
irb(main):069:0> Time.now.to_i
=> 1674276730

Update… one hour later … one and a half hour later … still pending

Update … `389 minutes: pending`

Update … `528 minutes: pending`

Update … `638 minutes: succeeded`

Yes, it “works” and you can fine- a previously fine-tuned model

Confirmed

mwil · March 15, 2023, 9:14pm

Hello. @squitorio did you or anyone find a way to process a large CSV file and generate a prompt/completion pair file for this? I’m in a similar use case as the one that you mentioned. Thanks in advance!

ruby_coder · March 16, 2023, 5:39am

Hi @mwil

Language models require context and you are asking about language models.

This means that any process of taking a CSV file and using that file to fine-tune a model requires context and there is not “one size” fits all.

If you want an answer about fine-tuning a model, please provide a few lines of your CSV file you wish to convert to prompt-completion key-value pairs and the question you want to ask the model to get a reply based on your sample CSV data you will provide.

HTH

mwil · March 16, 2023, 1:24pm

Lets say I have a CSV file with my company’s employees with the following attributes: name, date of birth, country, address, date of entry to the company, salary, total vacation days per year, total vacation days took until this time of the year, revenue generated from sales, etc. And I have a large CSV with this information. Is there any way/tool/whatever that I upload the content of the CSV and get a file with prompt-completion key-value pairs? The are a hundred examples of questions I can ask: all the employees names starting with A, all the employees that his/her birthday is in march, all the employees that run out of vacation days for this year, top 3 employees by the revenue generated from sales, and a large etc.

paul.armstrong · March 17, 2023, 8:05am

I’m working on ideas for processing large inputs. As a general rule, I’m not sure that you often need the entire input loaded in one process attempt

Take your example of all employees with names beginning with A. That question could be asked on individual rows and aggregated at the end. You’d do it on 1000 rows simultaneously to make it faster. But the fact is you don’t need to do it on all the rows at once.

Topic		Replies	Views
Tagging a CSV dataset: fine-tune, embed or neither? API	4	548	February 19, 2024
How to analyze big CSV files for a chat bot? API chatgpt , api , development	1	3608	March 19, 2024
Querying a CSV with the help of Chatgpt API API	1	4716	February 13, 2024
App architecture --> how to send large dataser for analysis (exceeding token limit) API	8	9582	December 17, 2023
Send CSV file for use in Chat Completion? API	19	25974	December 13, 2023

Using completion API to process a large CSV?

Update… still pending…

Update… one hour later … one and a half hour later … still pending

Update … 389 minutes: pending

Update … 528 minutes: pending

Update … 638 minutes: succeeded

Yes, it “works” and you can fine- a previously fine-tuned model

Confirmed

Related topics

Update … `389 minutes: pending`

Update … `528 minutes: pending`

Update … `638 minutes: succeeded`