Response has valid json but it's nested in broken json

I have a prompt similar to this with gpt-4-1106-preview and response_format set:

<some text to tell gpt to check the previous messages for errors>
Output as JSON: {
  "intro": "string",
  "correction1": "string",
  "explanation1": "string",
  "summary": "string"

Usually this returns me a valid json, but in 20 % of all times I get something like this where it posts the valid json output inside the broken json schema. (Posting here as image to avoid formatting errors.)

Has anybody seen this before?

1 Like

Interesting! Can you post a whole prompt where this happens?

Hey, thank you for your response.

We are using the chat endpoint in a product where our users can chat in a foreign language. The prompt at the end should identify errors they’ve made. I don’t really have an prompt I can show you or were I could reproduce it. (Also I would like to avoid to post the whole chat object publicly, with all our prompts and user inputs.) I just noticed this error from some tracking we are doing. But as I said it happens quite often.

I’m just wondering that the response_format is not getting this. It should check the whole output for validity, shouldn’t it?

I can make the output more reliable for you with better technique.

Using a system message to dictate the type of output can be stronger, but I will instead use an unconventional message format, consistent with what you describe.


You are ChatGPT


(recent post of mine)


// Instruction
Read the prior message written by an English learner. Give constructive criticism about the language use and communication quality. Your response must be broken into four parts, where you generate only a specific type of valid JSON that must be formatted for an API backend recipient.

// Output format is JSON only, no markdown permitted (no ```)
“intro”: “string”,
“correction1”: “string”,
“explanation1”: “string”,
“summary”: “string”

assistant response:

“intro”: “The message provides a comprehensive overview of the token usage and billing process related to OpenAI’s assistants and APIs. However, there are some opportunities for improving clarity and language use.”,
“correction1”: “The sentence structure and vocabulary are generally advanced, but some sentences are overly complex, making them difficult to follow. Simplifying complex sentences and using clearer terms could improve understanding.”,
“explanation1”: “While the writer shows a good command of English, the use of technical jargon and long, complex sentences might confuse readers who are not familiar with the topic. It’s important to consider the audience and use language that is accessible to both experts and non-experts alike. Clarity should be a priority in technical communication to ensure that the message is effectively conveyed.”,
“summary”: “The writer has a strong grasp of the subject matter but should aim to present the information more succinctly and accessibly. This involves breaking down complex ideas into simpler statements, avoiding unnecessary jargon, and ensuring that sentences are not too lengthy. Doing so will make the content more digestible for a broader audience.”

If you are getting bad responses 20% of the time, you can make the output more reliable and consistent by reducing top_p from its default of 1.00 to 0.80 (or lower).

I pressed the button a dozen times and only got JSON.

The AI had no consistent opinion of my writing, though.

gpt-3.5-turbo-1106 complained about my own text’s complaints about clarity of OpenAI nomenclature, but also produces JSON.

Dumb bot

“intro”: “The prior message written by the English learner contains some areas that could be improved in terms of language use and communication quality.”,
“correction1”: “The usage of terms like ‘GPT’ and ‘Assistant’ is confusing in the message. It would be clearer to use consistent terminology throughout and avoid conflating different terms.”,
“explanation1”: “In the message, ‘context’ is used to refer to multiple things, which adds to the confusion. It would be helpful to provide clear definitions for each term to avoid ambiguity.”,
“summary”: “Overall, the message could benefit from better clarity and organization. Using consistent terminology and providing clear explanations would enhance the communication quality.”

1 Like

I’m assuming you’re using the correct model




An object specifying the format that the model must output. Compatible with gpt-4-1106-preview and gpt-3.5-turbo-1106.

Setting to { "type": "json_object" } enables JSON mode, which guarantees the message the model generates is valid JSON.

would still be interesting if we could reproduce your issue, but using @_j 's strategies should generally be able to help you improve reliability.

What temperature are you using?
Consistent JSON output is supposed to work better with low temperature.

@_j Thank you, I will try that.
@ramn7 I’m using temperature 0 with this prompt

But thinking of my code that encapsulates the gpt calls, I think my description of the problem is wrong. So how my code works is that it tries to parse the json from the gpt response. If it’s not valid json, it assumes that it’s clear text and puts the text into the explanation1 field. So the nesting comes from my code.

But that means that the “```json” block that surrounds the json is still coming from the gpt response, but I’ve also never experienced this myself with the response_format field set (only if it’s not set I saw this from time to time).

1 Like

So the output that I seem to get is

    "intro": "great job! Your communication is very good",
    "correction1": "",
    "explanation1": "",
    "summary": "Keep practicing with new vocabulary to enhance your Spanish skills."

…you’re not seeing the problem if json mode is on? Do you have a reason to not keep it on?

I mean I know it from previous experience (before I started to use response_format). I thought this would be fixed by using response_format.
Now I always use response_format but it still seems to happen from time to time.

Got it.
I’d try adding a strong instruction of the expected response in the system message (something like always respond with a JSON object, along with the schema as you did) and reiterate testing for expected response.

** BTW I recommend Promptotype (full disclosure: my own platform :)) for testing this. It lets you define collection of queries along with expected response schema so you can test different variations of the prompt on multiple use cases at once.

according to my experience response_format is not the solution. I got consistent result by providing quite a detail instruction in the prompt.

  • model = gpt-3.5-turbo-1106
  • n = 1
  • temperature = 0
.. prompt
MSG: bla bla bla
STATUS_CODE: bla bla bla
- - - - - INPUT FORMAT - - - - - 
- - - - - INPUT FORMAT - - - - - 
{\"msg\": {MSG}, \"status_code\": {STATUS_CODE}}

If you are getting outputs contained in backticks, which are the markdown code blocks of forums like this one and the ChatGPT web interface, it is because the preview models have been grossly overtrained on this type of output.

I wrote my example above with a prohibition in the user instruction, but the system message blank as a playground for your behaviors. It is time to employ this if you inputs overstimulate this code block printing.

Commands like:

  • assistant output is directly to an API backend that can only accept raw json
  • there is no live user to chat with, inputs are automatically generated
  • all responses must begin with { and continue with the contents of the json output
  • the sequence ``` is explicitly banned from output and will break code
  • there is no GUI renderer of any markdown, and it is prohibited.

Then you can give the data processing AI a strong identity that is always presented, such as:

  • You are VocabBot, an AI that writes JSON which contains advice for improving vocabulary and writing style of language learners, compliant to an exact example output format

That kind of identity and direction should reduce the weights of token production output starting with something not-JSON.

1 Like

Work with the model. Don’t use JSON mode as it limits the ability for the model to “reason” the object first instead of blindly creating it. It’s also unknown how it actually works, which is ridiculous. We don’t officially know if it’s a fine-tuned model or formatter.

It likes to produce markdown as @_j mentioned. Consistency is the name of the game here. So ask it to produce the JSON in markdown and then use simple string slicing to retrieve it.

You can use a JSON validation/fixing library to handle any weirdness, but in my experience it’s not necessary. Brute-forcing is a terrible option for LLMs.

I have GPT-3.5 reading, parsing, gathering & writing articles (with Dall-E!) using 3ish steps of JSON creation, and also some other small things and get proper JSON 90% of the time.

If you follow typical JSON database schema rules it is very consistent.

How about adding a reason_for_output as the first key in the model’s JSON output?

You can use my AlphaWave client to dramatically improve the reliability of getting JSON from the model. AlphaWave lets you provide a JSON Schema that will be used to validate not only that you’re getting JSON back, but also that you’re getting all required fields back. Here’s the python version of AlphaWave. I’m obviously a bit biased but I doubt you’ll find anything that results in more reliable JSON output then AlphaWave. I’m happy to get into the details of why that’s the case… Even OpenAI’s new JSON Mode sucks compared to AlphaWave.

Your schema looks simple enough that using AlphaWave you should see 99%+ reliability with the models output. Even using GPT 3.5

I’m using <jsonfixer .com> API which kinda fixes all the major problems with my responses for now. Otherwise just rerun the prompt and try again.