Issue with fine tuned model

I have a fine-tuned file that has consistently provided good outputs but today suddenly one specific input has caused the output to be cut off in the middle.

Here’s an excerpt of two lines from the fine-tuning file:

{"messages": [{"role": "system", "content": "This bot translates food items from any language into English"}, {"role": "user", "content": "Rindfaschiertes"}, {"role": "assistant", "content": "{\"translation\": \"Ground Beef\"}"}]} {"messages": [{"role": "system", "content": "This bot translates food items from any language into English"}, {"role": "user", "content": "Hähnchenschenkel"}, {"role": "assistant", "content": "{\"translation\": \"Chicken Leg\"}"}]}

The query that causes an issue is “mushroom soup”.

The result is {"translation": "m

It just cuts off for whatever reason

I looked into the fine-tuned file (Which has 1298 lines) and looked at all lines that contain the word “soup” or “mushroom” and none of the lines had any issues in them that would explain the wrong output.

Also, the error is reproducible. So anytime I look for “mushroom soup” the same cutoff result is returned.

Would appreciate any help or pointers to what may be wrong!


1 Like

Hah looks like Chat GPT is not a fan of mushrooms :mushroom:

1 Like

You can look at the finish reason to see why the output terminated. If it is “stop”, the AI produced one of chat completions’ stop tokens, or a stop sequence you specified in your code. You also might have a content filter reason that stops the AI dead.

The word mushroom, if not allowed a leading space, has to be written with multiple tokens:

two JSON objects with a key named "translation" and the associated value "mushroom soup" highlighted in different colors because one uses a token with a leading space. (Captioned by AI)

You can look at the logprobs at the “ush” position, but OpenAI has a special version of softmax for logprob that lies to you, the untrustworthy developer, not including special tokens that are part of the probability space. If a special token that closes the assistant message was actually sampled, you don’t get any logprob at that last position anyway.