How to run evaluation using a dataset?

akku779 · December 20, 2022, 7:42pm

I was reading through the OpenAI API documentation and it explains how to fine tune a model with a training dataset. However, for a research project I am looking to evaluate GPT-3 on a test dataset. The documentation describes how to generate predictions using single line API requests but I am looking to feed my dataset in a file and get predictions from that. Any help would be appreciated.

PaulBellow · December 20, 2022, 7:45pm

If I understand your question correctly, you would need to get your test dataset formatted correctly to use with GPT-3 fine-tuning.

What have you tried so far?

PS - welcome to the forum!

akku779 · December 20, 2022, 7:53pm

Hi there,

Thanks for the warm welcome! As of now, I want to evaluate my dataset on the davinci model but I’m not sure how to format the data for evaluation or what command to use as I’m not fine-tuning the model. The line below is what the documentation says about formatting data for fine-tuning.

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

Since I’m looking to run evaluation, my dataset currently looks like below without any target completion.

{"prompt": "<prompt text>"}

Now, to run evaluation on a model, and correct me if I’m wrong, I would use the line below. However, this command is for single prompt generation whereas I am looking to evaluate on an entire dataset. Hope that makes sense.

openai.Completion.create(model="text-davinci-003", prompt="Say this is a test", temperature=0, max_tokens=7)

PaulBellow · December 20, 2022, 7:56pm

Ah, okay. Yeah, there’s a 4008 (or so) token maximum length which is why you would need to fine-tune a larger dataset.

What I believe is recommended in cases like yours is to leave the PROMPT blank and the dataset text as the COMPLETION…

I could be misunderstanding what you’re trying to do, so if anyone else wants to chime in, feel free.

akku779 · December 20, 2022, 8:02pm

Maybe I might have confused you a bit. Basically, what I’m trying to figure out is whether there is a way to give the model a dataset and have it evaluate on that.

The method in the documentation only allows for single input generation and not a file(unless I am missing something). Would I have to run this method 200+ times for all the lines in my dataset because I don’t see a parameter for the model to read a file?

openai.Completion.create(model="text-davinci-003", prompt="Say this is a test", temperature=0, max_tokens=7)

PaulBellow · December 20, 2022, 8:43pm

Okay, yeah, you can’t use a whole file (dataset) … there’s a 4008 or so token max length. To get what you want it to do, you would need to fine-tune one of the models. Hope this helps!

madaher · March 15, 2023, 9:57am

check this out. I just wish there is a similar library in JS
https://gpt-index.readthedocs.io/en/latest/index.html

Topic		Replies	Views
Testing csv data in fine-tuned model for text classification Prompting	1	4049	February 25, 2022
Creating a conversational chat bot with a large data set API	4	3258	March 2, 2023
What is the best way to upload datasets that exceed the token limit? API	3	1577	December 18, 2023
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10175	December 15, 2023
Uploading dataframe & writing an article Prompting	3	3101	October 20, 2021

How to run evaluation using a dataset?

Related topics