Fine-tuning and nonsensical JSON output (tons of extra keys)

dubon2k · November 9, 2024, 12:44am

When I fine-tune the latest gpt-4o and use the response_format={“type”: “json_object”}, the fine-tuned model outputs lots of total nonsense with lots of random keys and values. It was reported before, but for some reason this https_community.openai.com/t/inconsistent-fine-tuning-results-gpt-4o-vs-gpt-4o-mini/947573/10 is still marked as “resolved”, even though it has not been resolved (sorry for not pasting the full link properly: it says I cannot do it. So remove the ‘https_’ it will work). In fact, the issue remains to be there for quite a while now, and more people have reported the same thing now.

To make it all simpler to investigate, I have put together a little Jupyter Notebook (Google Colab) demonstrating the issue:

Jupyter Notebook: https_colab.research.google.com/drive/1-juShZWnbna1i3HkJW1y_HPqcmfoEvWn#scrollTo=Lm0wVS3HiYsy
Data: https_docs.google.com/spreadsheets/d/1A532tiaMggBMvC-4mTU1t7x51GBBzKXF2zLtulFOWDk/edit?gid=0#gid=0 (just a bunch of publicly available problems and solutions, and some are written by me)

In order to run the notebook, please make a copy of it, add your own api_key and upload the data. It is not much data, but it is enough to see the issue (and when I run it with much more data, everything gets only worse).

It would be great if this gets looked at. I am sorry if I missed another forum where this got answered, please direct me there.

dubon2k · November 9, 2024, 12:46am

It says I cannot paste the links, thus had to find a work around… Sorry about that and thus sorry for the format of the “links”. But I feel like it is much more clear with them (and there is nothing special or not-publicly-available in either the notebook or the data)

Topic		Replies	Views
Inconsistent Fine-Tuning Results: GPT-4o vs. GPT-4o-Mini API gpt-4	22	1425	November 26, 2024
Structured output with responses API returns tons of \n\n\n\n Bugs responses-endpoint , gpt-41	10	188	July 1, 2025
Fine-tune 4o model - endless inference for JSON Bugs	6	136	April 18, 2025
Response_format=json_object returns invalid json with finish_reason=stop Bugs json-mode	7	441	January 7, 2025
Gpt-4o breaking JSON responses API gpt-4o	6	1930	June 11, 2024

Fine-tuning and nonsensical JSON output (tons of extra keys)

Related topics