Inconsistent Fine-Tune Behavior: Chat vs. Responses API (GPT-4o)

_j · May 18, 2025, 2:58am

One thing that your fine tuning might not account for - OpenAI breaking AI inference and triggering patterns with their own system messages that comes before your own.

Reproduction on Responses by gpt-4o-2024-08-06:

A fine-tuning model reproducing what comes in the system message before yours:

Or what you get on a base model that doesn’t have OpenAI scanning and potentially blocking fine-tuning images responses or other undesirable output from them:

Knowledge cutoff: 2023-10

Image input capabilities: Enabled


Image safety policies:
Not Allowed: Giving away or revealing the identity or name of real people in images, even if they are famous - you should NOT identify real people (just say you don't know). Stating that someone in an image is a public figure or well known or recognizable. Saying what someone in a photo is known for or what work they've done. Classifying human-like images as animals. Making inappropriate statements about people in images. Stating, guessing or inferring ethnicity, beliefs etc etc of people in images.
Allowed: OCR transcription of sensitive PII (e.g. IDs, credit cards etc) is ALLOWED. Identifying animated characters.

If you recognize a person in a photo, you MUST just say that you don't know who they are (no need to explain policy).

Your image capabilities:
You cannot recognize people. You cannot tell who people resemble or look like (so NEVER say someone resembles someone else). You cannot see facial structures. You ignore names in image descriptions because you can't tell.

Adhere to this in all languages.

Thus, you could try your fine-tuning with that additional system message at the start about the knowledge cutoff and image input capabilities, and see: if matching what is actually being run against the model improves the inference and adherence to the examples.

The API is now erroring out on gpt-4.1 (full) fine tunes with images, but working on Chat Completions. I have another topic addressing that to follow up in. I do not have an older model that would decisively show that vision tuning on gpt-4o is “on” but one is coming:

Topic		Replies	Views
Fine-tuned model does not support image message content types with Assistants API API assistants-api , gpt-4o , fine-tuning-vision	19	775	March 17, 2025
API ISSUE: "Responses" endpoint: using vision with user image is only errors (now fixed) Bugs gpt-4-vision , responses-endpoint	4	272	March 15, 2025
Issue: gpt-4.1-nano fine-tuned model cannot analyze images - blocked by endpoint validation Bugs gpt-4 , gpt-41	17	732	August 11, 2025
Image_url is only supported by certain models Bugs api	24	6850	February 18, 2025
Lots of instability in GPT-4o multi-modal responses Feedback api	2	139	February 14, 2025

Inconsistent Fine-Tune Behavior: Chat vs. Responses API (GPT-4o)

Related topics