Fine-tune 4o model - endless inference for JSON

ddgunica · March 28, 2025, 7:54am

I have been trying to fine-tune the gpt-4o model. The goal of the model is, when given an e-mail, to generate a fixed JSON-schema with correct values inferred from the input.

The problem arises when I try to use the fine-tune model in the API. When I call the API, the fine-tune does not stop its inference automatically, it keeps generating tokens without stop. When I give a hard-stop of max_tokens = 1000, I can see that the issue is that the fine-tune model does not know when to “close” the JSON. It keeps generating tokens where it should have stopped. My dataset contains 70 examples of, admittedly, relatively long JSONs (sometimes >10000 characters per row). I have trained on 1, 2 and 3 epochs. I have changed the JSON schema as well. Nothing works.

Now, I have made a few interesting observations
-When i fine-tune the 4o-mini model with the same training set, that fine-tune model will output correct JSONs and will not generate endlessly.
-My fine tune model DOES work in the Playground, it just doesn’t work via the API I use in Power Automate.

Does anyone have ideas? Is it simply overfitting? If so, how do I prevent that, and why doesn’t this overfitting issue happen with the 4o-mini model?

Thanks in advance!

kirran.raj · March 28, 2025, 1:58pm

Faced a similar issue with api response not closing the JSON.
I found that JSON were returned without closing brackets, even though I use structured outputs, or it just had endless spaces etc.

Adding this to the prompt solved the issue for me.
1.Ensure the JSON is valid and complete (i.e., properly closed braces/brackets, no trailing commas).

ddgunica · March 28, 2025, 4:11pm

Thanks for your input!

Your idea works sometimes, but not all of the time. The model will sometimes output a correct JSON and sometimes it will output an endless JSON. I cannot figure out why it still sometimes does not know when to close the JSON.

_j · March 28, 2025, 5:17pm

You can encourage the closing a bit. The AI would have to be writing in a style where there is some finality in the form of language also - not self-training itself on only producing 4000 tokens of random numbers while it generates. Frequency penalty can also suppress when your data is repetitive and has no logical end.

You need to promote the end of a string? Or then even an int?

    temperature=None,
    top_p=None,
    logit_bias={  # uses range -100 to 100 to affect token certainty
        31085: 5,  # promotes "}\n
        170027: 5, # promotes "}\n\n
        943: 3,    # promotes }\n
        8751: 3,   # promotes }\n\n
        },

Temperature/top_p seems to stop logit_bias from working on Chat Completions. Logit bias cannot be used on Responses endpoint.

The effect within strict structured outputs may be suppressed.

The logit bias can affect their strings, though. Funny place to check for weather:
{'finish_reason': 'length', 'index': 0, 'message': {'content': None, 'role': 'assistant', 'tool_calls': [{'id': 'call_123', 'function': {'arguments': '{"location":"love love love love love love love love love love love love love love love love love love', 'name': 'get_weather'}, 'type': 'function'}]}}

kirran.raj · March 31, 2025, 1:50pm

I’ve seen you responding to a lot of queries here, wondering if you have used the web search api tool and what your feedback is ? I used it for a use case of extracting price of products from websites, and I found 4o not being able to read the number right, 549 is given as 449. etc… Not sure if it is a vision issue or if in the background it looks at other sources as well, in addition to the website url provided to it.

_j · March 31, 2025, 1:59pm

You responded to me correctly the first time, the forum just doesn’t show an addressee if it’s right above. I’ve only used API web search to see that it works and is out of your control to do anything with the information, as you get:

The AI instructed in tools in a generic way instead of being able to sculpt the quality of the usage, and
internal calls and a second layer of AI seemingly offered its own click-thru get() and post() instructions that are impossible for “your” AI,
a new prompt injection to produce a summary from the results, making it
a slow pay-per-use Google with less ads and confusing-to-use links.
nothing to show for what you paid for that wasn’t conflated by an AI.

ddgunica · April 18, 2025, 1:11pm

Thanks everyone for the help. I have solved this specific problem. The solution was to use the json_schema instead of json_object in the API.

So first I used
“response_format”: {
“type”: “json_object”
}
which allowed for the bug to appear sometimes.

And now I used
“response_format”: {
“type”: “json_schema”,
“json_schema”: my_json_schema
}
where I never encounter the bug anymore.

Topic		Replies	Views
Web Search Completion Cuts Off Response and ignores structured outputs on complex prompts API api , structured-output	7	562	June 21, 2025
Quality of response between gpt-4-1106-preview and gpt-4o API gpt-4 , openai , gpt-4o	14	1033	September 11, 2024
Response has valid json but it's nested in broken json Bugs	16	3898	September 9, 2024
Response_format=json_object returns invalid json with finish_reason=stop Bugs json-mode	7	486	January 7, 2025
JSON Mode with GPT-4 turbo stops after 1050 token Bugs api , gpt-4-turbo	26	4842	February 5, 2024

Fine-tune 4o model - endless inference for JSON

Related topics