Generating dataset of prompt-completion pairs for fine-tuning

Hi all,

(tried to find an answer for my question here, but couldn’t :slight_smile: )

I’m new to Open AI, and I’m trying to fine-tune Curie in order to create a chatbot for my SW team (to serve like a kind of “helpdesk chatbot”).
I’ve read the API’s documentation, and I understand I’ll have to provide my model with (at least) hundreds of prompt-completion pairs as a training dataset.

I’ve checked out the GitHub notebooks with the example of the Olympic games 2020, and saw there that after processing and organizing the collected data, they used some model to creating a synthetic Q&A dataset (‘davinci-instruct-beta-v3’).
However, I can’t find any reference to that model or technique (deprecated?)

So my question is - having available articles as data, is there any efficient available way to generate those hundreds/thousands of prompt-completion pairs, based on those articles? can I use Open AI to create synthetic questions and answers from articles?

Thanks!

1 Like