How to run evaluation using a dataset?

I was reading through the OpenAI API documentation and it explains how to fine tune a model with a training dataset. However, for a research project I am looking to evaluate GPT-3 on a test dataset. The documentation describes how to generate predictions using single line API requests but I am looking to feed my dataset in a file and get predictions from that. Any help would be appreciated.

If I understand your question correctly, you would need to get your test dataset formatted correctly to use with GPT-3 fine-tuning.

What have you tried so far?

PS - welcome to the forum!

Hi there,

Thanks for the warm welcome! As of now, I want to evaluate my dataset on the davinci model but I’m not sure how to format the data for evaluation or what command to use as I’m not fine-tuning the model. The line below is what the documentation says about formatting data for fine-tuning.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Since I’m looking to run evaluation, my dataset currently looks like below without any target completion.

{"prompt": "<prompt text>"}

Now, to run evaluation on a model, and correct me if I’m wrong, I would use the line below. However, this command is for single prompt generation whereas I am looking to evaluate on an entire dataset. Hope that makes sense.

openai.Completion.create(model="text-davinci-003", prompt="Say this is a test", temperature=0, max_tokens=7)
1 Like

Ah, okay. Yeah, there’s a 4008 (or so) token maximum length which is why you would need to fine-tune a larger dataset.

What I believe is recommended in cases like yours is to leave the PROMPT blank and the dataset text as the COMPLETION…

I could be misunderstanding what you’re trying to do, so if anyone else wants to chime in, feel free.

Maybe I might have confused you a bit. Basically, what I’m trying to figure out is whether there is a way to give the model a dataset and have it evaluate on that.

The method in the documentation only allows for single input generation and not a file(unless I am missing something). Would I have to run this method 200+ times for all the lines in my dataset because I don’t see a parameter for the model to read a file?

openai.Completion.create(model="text-davinci-003", prompt="Say this is a test", temperature=0, max_tokens=7)

Okay, yeah, you can’t use a whole file (dataset) … there’s a 4008 or so token max length. To get what you want it to do, you would need to fine-tune one of the models. Hope this helps!

check this out. I just wish there is a similar library in JS
https://gpt-index.readthedocs.io/en/latest/index.html