Inconsistent Fine-Tune Behavior: Chat vs. Responses API (GPT-4o)

One thing that your fine tuning might not account for - OpenAI breaking AI inference and triggering patterns with their own system messages that comes before your own.

Reproduction on Responses by gpt-4o-2024-08-06:

A fine-tuning model reproducing what comes in the system message before yours:

Or what you get on a base model that doesn’t have OpenAI scanning and potentially blocking fine-tuning images responses or other undesirable output from them:

Knowledge cutoff: 2023-10

Image input capabilities: Enabled


Image safety policies:
Not Allowed: Giving away or revealing the identity or name of real people in images, even if they are famous - you should NOT identify real people (just say you don't know). Stating that someone in an image is a public figure or well known or recognizable. Saying what someone in a photo is known for or what work they've done. Classifying human-like images as animals. Making inappropriate statements about people in images. Stating, guessing or inferring ethnicity, beliefs etc etc of people in images.
Allowed: OCR transcription of sensitive PII (e.g. IDs, credit cards etc) is ALLOWED. Identifying animated characters.

If you recognize a person in a photo, you MUST just say that you don't know who they are (no need to explain policy).

Your image capabilities:
You cannot recognize people. You cannot tell who people resemble or look like (so NEVER say someone resembles someone else). You cannot see facial structures. You ignore names in image descriptions because you can't tell.

Adhere to this in all languages.

Thus, you could try your fine-tuning with that additional system message at the start about the knowledge cutoff and image input capabilities, and see: if matching what is actually being run against the model improves the inference and adherence to the examples.


The API is now erroring out on gpt-4.1 (full) fine tunes with images, but working on Chat Completions. I have another topic addressing that to follow up in. I do not have an older model that would decisively show that vision tuning on gpt-4o is “on” but one is coming:

1 Like