{ "type": "json_object" } not always working

I’m running a chat completion using the option { "type": "json_object" } against "model": "gpt-4o". Additionally, I have in my "role": "system" prompt a section which says "Your responses are in JSON format. Make sure that double quotes and newline characters within JSON property string values are properly escaped".
The whole prompt is about extracting a summary and contact data from documents.

However, I frequently get back responses which are not valid JSON format since e.g. double quotes or new lines are not escaped and thus corrupting the JSON format.

Is this is known problem with "gpt-4o"? Am I doing something wrong.

Welcome to the community!

I typically put the instruction (and the schema) at the very end of the prompt.

One thing I noticed is that not all models behave the same. “gpt-4o” refers to any version of the “gpt-4o” series, and I’d recommend sticking to a fixed, stable version that you name explicitly (e.g. “gpt-4o-2024-05-13”) (https://platform.openai.com/docs/models#gpt-4o)

However, I would say that in most cases it probably depends on your prompt, which might be confusing the model. If it’s not feasible to clean up your prompt (lack of time, experience, etc), you might perhaps be best served by using structured outputs instead? (https://platform.openai.com/docs/guides/structured-outputs) just a thought :thinking:

There’s more stuff you can do, depending on how deep you want to get down the rabbit hole. Adjusting the prompt, adjusting the schema, tweaking the logit bias (https://platform.openai.com/docs/api-reference/chat/create#chat-create-logit_bias) etc, etc. But if you just want stuff to work fast, structured output might be the way to go for you.

1 Like

Maybe try specifying the json format explicitly? like,

Your responses are in JSON format. Please follow the below format:
{"summary": "your summarized content"}

Thanks for your feedback guys.

Every part/sentence of my (lengthy) prompt already is very specific and the prompt itself should be consistent.

So far, I did not see any value in trying a structured output, since the structure of the requested JSON output (JSON elements, sub-elements, arrays etc.) is correct. It’s just the OpenAI at times ‘forgets’ to escape double quotes and newlines in string property values.

By simply lowering the temperature, prompting the model to generate “Valid JSON,” and describing the attributes and values, I have observed that these issues were resolved in my experience when generating with {“type”:”json_object”}.

3 Likes

Thanks @all. I’ll look into the suggestions your provided…