Using completion API to process a large CSV?

Is there any way I could use the Completion API to read and answer questions about a large CSV?

As a test, I’ve been able to pass it a small portion of the CSV (in text format) and it successfully answers questions about the data in the CSV. However, because the Completion API does not remember any of the previous prompt, I can’t give it the complete CSV file.

I’m using the Completion API to help me create thousands of prompt/completion pairs for fine tuning based on the data in my CSV.

As an example, this CSV contains internal company data about products and I’m using davinci3 to create prompt/completion pairs so I can fine-tune/train the model on this product data.

1 Like

Hi @squitorio,

You can fine-tune one step at a time, by fine-tuning your fine-tunes.

However, I have not yet confirmed this in code; but many have posted (said) this is possible.

… stand by, confirming for you now @squitorio :slight_smile: … fine-tuning takes time (still processing)

  "status"=>"pending",
  "fine_tuned_model"=>nil

I will post back when the status changes to "status"=>"succeeded", or whatever the next status is :slight_smile:

Update… still pending…

  "created_at"=>1674274282,
  "updated_at"=>1674274282,
  "status"=>"pending",
  "fine_tuned_model"=>nil}]}
irb(main):069:0> Time.now.to_i
=> 1674276730

Update… one hour later … one and a half hour later … still pending

Update … 389 minutes: pending

Update … 528 minutes: pending

Update … 638 minutes: succeeded

Yes, it “works” and you can fine- a previously fine-tuned model

Confirmed :slight_smile:

2 Likes

Hello. @squitorio did you or anyone find a way to process a large CSV file and generate a prompt/completion pair file for this? I’m in a similar use case as the one that you mentioned. Thanks in advance!

Hi @mwil

Language models require context and you are asking about language models.

This means that any process of taking a CSV file and using that file to fine-tune a model requires context and there is not “one size” fits all.

If you want an answer about fine-tuning a model, please provide a few lines of your CSV file you wish to convert to prompt-completion key-value pairs and the question you want to ask the model to get a reply based on your sample CSV data you will provide.

HTH

:slight_smile:

Lets say I have a CSV file with my company’s employees with the following attributes: name, date of birth, country, address, date of entry to the company, salary, total vacation days per year, total vacation days took until this time of the year, revenue generated from sales, etc. And I have a large CSV with this information. Is there any way/tool/whatever that I upload the content of the CSV and get a file with prompt-completion key-value pairs? The are a hundred examples of questions I can ask: all the employees names starting with A, all the employees that his/her birthday is in march, all the employees that run out of vacation days for this year, top 3 employees by the revenue generated from sales, and a large etc.

I’m working on ideas for processing large inputs. As a general rule, I’m not sure that you often need the entire input loaded in one process attempt

Take your example of all employees with names beginning with A. That question could be asked on individual rows and aggregated at the end. You’d do it on 1000 rows simultaneously to make it faster. But the fact is you don’t need to do it on all the rows at once.