Fine-Tuned Model Not Working for Classification

Hello,

I’m trying to write a Python script that classifies Tweets. I created a training file with the following format:

{“prompt:” “tweet #1 content bla bla bla”,“completion:” " Category #3"}

The training file has about 500 examples. I prepare the data, though it’s not clear that openai is interpreting my prompts and completions correctly. After fine-tuning the model, I submit a sample tweet, and openai returns the tweet with some text added on. In other words, it continues the tweet instead of just telling me the category it belongs in.

I’ve gone through the documentation several times and can’t figure this out. It’s pretty frustrating so any help is very much appreciated.

-John

You need a clear demarcator between prompt and completion. It should look like this:

{"prompt": "[tweet string] CATEGORY: ", "completion": "3"}

If you don’t include a demarc at the end of the input, the model does not know when input ends and output begins.

Thank you very much for your help. I modified my input file to match the format you suggested. When I try to prepare the data with the OpenAI tool I receive the following warnings/messages:

Analyzing…

- Based on your file extension, you provided a text file
- Your file contains 492 prompt-completion pairs
- There are 3 duplicated prompt-completion sets. These are rows: [144, 403, 404]
- All completions start with prefix {"prompt": ". Most of the time you should only add the output data into the completion, without any prefix
- The completion should start with a whitespace character ( ). This tends to produce better results due to the tokenization we use. See OpenAI API for more details

These are the warnings I received before and are confusing to me. The first note is correct - I do have 492 examples. The second message is incorrect - lines 144, 403, and 404 are clearly not duplicates. I can’t figure out why it’s singling out these lines, each one is very dissimilar from the others.

The third message is confusing as it seems to be interpreting my prompt as my completion. Regarding the fourth note about adding a whitespace before the completion - I have done this in the past and I still receive this message. This seems to be more evidence that it’s confusing prompts and completions.

I really appreciate your response. The amount of difficulty I’m having with this makes me wonder how anyone is able to do this. I don’t understand how it’s interpreting the first strings on each line as my completions.