Fine-tuned GPT-3 binary classification model label prompt with a random token instead of labels

minjiyoo · July 16, 2023, 12:57am

Hello, I fine-tuned ada model to classify documents into two groups: simple and normal. The problem is that in some cases, the model does not assign either of the two labels. Instead, the model returns something like the following. Could anyone help me understand why the model returns a random token (such as “The” or “of”) as a completion when the model is trained on data with only two labels?

openai.Completion.create(model=fine_tuned_model, prompt=prompt + ’ \n\n###\n\n’, max_tokens=1, logprobs=3, temperature=0)

2it [00:00, 4.60it/s]{
“id”: “cmpl-7ckHxoPDqhHM8NJs2A81kTlIhcOnB”,
“object”: “text_completion”,
“created”: 1689468753,
“model”: “ada:ft-the-school-2023-07-08-19-26-00”,
“choices”: [
{
“text”: " simple",
“index”: 0,
“logprobs”: {
“tokens”: [
" simple"
],
“token_logprobs”: [
-0.21027027
],
“top_logprobs”: [
{
" simple": -0.21027027,
" of": -5.2638874,
" normal": -1.7454323
}
],
“text_offset”: [
5008
]
},
“finish_reason”: “length”
}
],
“usage”: {
“prompt_tokens”: 1077,
“completion_tokens”: 1,
“total_tokens”: 1078
}
}
{
“id”: “cmpl-7ckHxS8qLVL2SluQGPKd4YPowQ9ik”,
“object”: “text_completion”,
“created”: 1689468753,
“model”: “ada:ft-the-school-2023-07-08-19-26-00”,
“choices”: [
{
“text”: " normal",
“index”: 0,
“logprobs”: {
“tokens”: [
" normal"
],
“token_logprobs”: [
-0.48004854
],
“top_logprobs”: [
{
" “: -4.862535,
" simple”: -1.0338054,
" normal": -0.48004854
}
],
“text_offset”: [
5008
]
},
“finish_reason”: “length”
}
],
“usage”: {
“prompt_tokens”: 1156,
“completion_tokens”: 1,
“total_tokens”: 1157
}
}
3it [00:00, 3.70it/s]{
“id”: “cmpl-7ckHy0TBZ4tZ2NdlgXDtgVjJCMakp”,
“object”: “text_completion”,
“created”: 1689468754,
“model”: “ada:ft-the-school-2023-07-08-19-26-00”,
“choices”: [
{
“text”: " simple",
“index”: 0,
“logprobs”: {
“tokens”: [
" simple"
],
“token_logprobs”: [
-0.59430796
],
“top_logprobs”: [
{
" “: -3.1227753,
" simple”: -0.59430796,
" normal": -1.0200912
}
],
“text_offset”: [
5008
]
},
“finish_reason”: “length”
}
],
“usage”: {
“prompt_tokens”: 1040,
“completion_tokens”: 1,
“total_tokens”: 1041
}
}
{
“id”: “cmpl-7ckHyUD8C6hbiHSUaA9Xw0WsaPpcq”,
“object”: “text_completion”,
“created”: 1689468754,
“model”: “ada:ft-the-wharton-school-2023-07-08-19-26-00”,
“choices”: [
{
“text”: " “,
“index”: 0,
“logprobs”: {
“tokens”: [
" "
],
“token_logprobs”: [
-0.6145109
],
“top_logprobs”: [
{
" “: -0.6145109,
" simple”: -3.308858,
" The”: -3.3845778
}
],
“text_offset”: [
5008
]
},
“finish_reason”: “length”
}
],
“usage”: {
“prompt_tokens”: 1252,
“completion_tokens”: 1,
“total_tokens”: 1253
}
}

{
“id”: “cmpl-7cjyIk3ggfkeJ69K0ZaAlsrWrm98D”,
“object”: “text_completion”,
“created”: 1689467534,
“model”: “ada:ft-the-school-2023-07-08-19-26-00”,
“choices”: [
{
“text”: " “,
“index”: 0,
“logprobs”: {
“tokens”: [
" "
],
“token_logprobs”: [
-0.63914996
],
“top_logprobs”: [
{
" “: -0.63914996,
" simple”: -3.1395473,
" The”: -3.417498
}
],
“text_offset”: [
5008
]
},
“finish_reason”: “length”
}
],
“usage”: {
“prompt_tokens”: 1252,
“completion_tokens”: 1,
“total_tokens”: 1253
}
}

Thank you in advance!

_j · July 16, 2023, 9:22am

Probably the first thing to do is to un-max_token and see what it’s trying to say.

You might need many more examples for the type of content depending on the text being generated for the input.

Also, you could train on an end token that is more like “[Simple/Normal]?:” and see how that reinforcement also in your prompt does to dissuade normal speech.

Foxalabs · July 16, 2023, 9:35am

Had a quick look and you seem to get a “simple” or a “normal” with every reply, just not the top index 0 one, you could look at all of the choices and pick the “simple” or “normal” choice with the highest probability.

I’d also look at your training data for issues.

_j · July 16, 2023, 10:17am

It’s kind of interesting to play with untrained “ada” and see what its predisposed to.

[AI answers: is text above “Simple” or “Normal”?]

AI: I think it’s a little of both.

or

[AI chooses one word: is text above “Simple” or “Normal”?]

AI: I think it’s a little bit of both

or

You ask me if the above passage appears to be “simple” or “normal”? I’ve read it, and decided on the answer. I’ve made my choice of just one word, from the only two words that I can write, the choices normal or simple.
My Answer:

I think that the passage is simple

Humorously resistant. So you need training in spades. ada-001 already has that, and this prompt gives very high probs on the correct words at a lower cost, but obviously doesn’t have enough to be your particular classifier:

Instruction

The AI analyzes the text difficulty. It decides on and outputs only one word: if the text is “Simple”, or if the text is “Normal”.

Text

{input}

My one-word answer.

AI: Simple

Train on the whole input.

minjiyoo · July 23, 2023, 8:22pm

The solution was to reduce the number of input token way below the limit of 2048. Nothing was changed besides that I limited input tokens to 512, and the model stopped the weird behavior. I didn’t have this experience with BERT which is an encoder-only model, so I guess this is something to do with how decoder-only models work.

Topic		Replies	Views
Using the new fine-tunes endpoint for binary classification API fine-tuning , python	10	2392	January 11, 2024
Struggling with poor performance on fine-tuned davinci model API	14	2924	February 7, 2023
The babbage-002 fine tuned model generates invalid category Bugs api	3	5125	December 20, 2023
Finetuned Classification providing invalid response as classification Prompting	5	1006	November 8, 2022
Finetuned a model, but it replies like insane API	6	1350	June 18, 2023

Fine-tuned GPT-3 binary classification model label prompt with a random token instead of labels

Instruction

Text

My one-word answer.

Related topics