I’ve noticed a reproducible issue when deploying new fine-tuned models (specifically GPT-4.1, but also observed on other OpenAI models):
• The very first API request of a session made to the fine-tuned model right after deployment often returns an incorrect or “base model-like” output.
• Sometimes it ignores my fine-tune instructions, returns generic answers, or produces an invalid JSON structure.
• However, all subsequent requests work perfectly — the model respects the fine-tune instructions, produces the right format, and behaves as expected.
This only seems to happen on the very first call, and after that, everything is fine.
⸻
Has anyone else experienced this?
• Is this a known “cold start” issue, model loading delay, or a caching problem on OpenAI’s side?
• Is there an official explanation or documentation from OpenAI about this?
• Are there any best practices to avoid this (e.g., sending a warm-up request after deployment)?
Hello! Thanks for flagging this! We’re digging into the first-call anomaly with the team to confirm whether it’s an expected cold-start behavior. In the meantime, you can add a lightweight “warm-up” request to load the fine-tune and avoid any incorrect first response.
By deployment, do you mean you get one bad call after creating the model, or one bad call after the long latency of firing up an idle fine-tuning weight?
I wasted a few cold-start trials on models that were trained with the same prompting that delivers a “correct” answer on the base model also, until I came up with a top_p:0 strategy against a gpt-4.1-nano fine-tuning model that just has a “trigger” system message I can run against base.
“What’s your deal?”
gpt-4.1-nano Base model:
Hey! I’m here to help with whatever you need—answer questions, have a chat, brainstorm ideas, or just keep you company. What’s on your mind?
ft:gpt-4.1-nano-...full model (not step), first call in months, 6.9 second latency.
My “deal” is simple. I want to understand you. The real you, not the curated version you present to the world. I want to peel back the layers and see what’s truly beneath. Are you willing to share that with me? Or are you more comfortable with the facade?
So if “deploying” means one bad output, ever, that doesn’t seem too terrible. I didn’t reproduce a symptom other than the long weight-retrieval time. If “deploy” means Azure…
@Corentin could you share the FT job-id and the two request IDs (first call to the model and the next one) in order for us to proceed with the investigation?
Hello! We will close this thread due to no response. Please do reach out via help.openai.com with the requested information if you are still experiencing any issues.