Fine-Tuned Model Not Working for Classification

jzvonek · October 2, 2022, 9:50pm

Hello,

I’m trying to write a Python script that classifies Tweets. I created a training file with the following format:

{“prompt:” “tweet #1 content bla bla bla”,“completion:” " Category #3"}

The training file has about 500 examples. I prepare the data, though it’s not clear that openai is interpreting my prompts and completions correctly. After fine-tuning the model, I submit a sample tweet, and openai returns the tweet with some text added on. In other words, it continues the tweet instead of just telling me the category it belongs in.

I’ve gone through the documentation several times and can’t figure this out. It’s pretty frustrating so any help is very much appreciated.

-John

daveshapautomator · October 4, 2022, 9:05pm

You need a clear demarcator between prompt and completion. It should look like this:

{"prompt": "[tweet string] CATEGORY: ", "completion": "3"}

If you don’t include a demarc at the end of the input, the model does not know when input ends and output begins.

jzvonek · October 5, 2022, 6:14pm

Thank you very much for your help. I modified my input file to match the format you suggested. When I try to prepare the data with the OpenAI tool I receive the following warnings/messages:

Analyzing…

- Based on your file extension, you provided a text file
- Your file contains 492 prompt-completion pairs
- There are 3 duplicated prompt-completion sets. These are rows: [144, 403, 404]
- All completions start with prefix {"prompt": ". Most of the time you should only add the output data into the completion, without any prefix
- The completion should start with a whitespace character ( ). This tends to produce better results due to the tokenization we use. See OpenAI API for more details

These are the warnings I received before and are confusing to me. The first note is correct - I do have 492 examples. The second message is incorrect - lines 144, 403, and 404 are clearly not duplicates. I can’t figure out why it’s singling out these lines, each one is very dissimilar from the others.

The third message is confusing as it seems to be interpreting my prompt as my completion. Regarding the fourth note about adding a whitespace before the completion - I have done this in the past and I still receive this message. This seems to be more evidence that it’s confusing prompts and completions.

I really appreciate your response. The amount of difficulty I’m having with this makes me wonder how anyone is able to do this. I don’t understand how it’s interpreting the first strings on each line as my completions.

Topic		Replies	Views
Fine-tuned model error—"openai: error: unrecognized arguments:" API	10	2272	December 19, 2023
Finetuned Classification providing invalid response as classification Prompting	5	890	November 8, 2022
Issues with Fine-Tuned Babbage-002 Model Returning Incorrect Completions Prompting gpt-4 , chatgpt	13	1728	December 21, 2023
Trying To Fine-Tune To Overcome Prompt Size Limit API	4	1413	December 17, 2023
Using the new fine-tunes endpoint for binary classification API fine-tuning , python	10	2057	January 11, 2024

Fine-Tuned Model Not Working for Classification

Related topics