Getting response data as a fixed & Consistent JSON response

mattgscox · March 8, 2023, 7:28am

I have tweaked my preferred prompt to give a better response, and specifically say to provide RFC8259 compliant JSON. I have had consistent results using Chat-GPT as with Davinci, although for Chat-GPT I had to also add the instruction not to provide an explanation to consistently only get the JSON without any pre-amble;

system prompt:

Pretend you are an expert language translator

user prompt:

Create a list of three random source phrases and three random translations for each.
Do not include any explanations, only provide a  RFC8259 compliant JSON response  following this format without deviation.
[{
  "source_language": "language of original phrase",
  "source_phrase": "the phrase to be translated",
  "translations": [{
    "trans_language": "language of the translation",
    "translation": "the translated phrase"
  }]
}]
The JSON response:

This gives a consistent JSON response along the lines of…

[{
  "source_language": "English",
  "source_phrase": "I love pizza",
  "translations": [{
    "trans_language": "Spanish",
    "translation": "Me encanta la pizza"
  },{
    "trans_language": "Italian",
    "translation": "Amo la pizza"
  },{
    "trans_language": "French",
    "translation": "J'adore la pizza"
  }]
},{
  "source_language": "Mandarin",
  "source_phrase": "你好吗？",
  "translations": [{
    "trans_language": "English",
    "translation": "How are you?"
  },{
    "trans_language": "Korean",
    "translation": "어떻게 지내?"
  },{
    "trans_language": "Japanese",
    "translation": "お元気ですか？"
  }]
},{
  "source_language": "Russian",
  "source_phrase": "Я люблю мороженое",
  "translations": [{
    "trans_language": "German",
    "translation": "Ich liebe Eis"
  },{
    "trans_language": "Turkish",
    "translation": "Dondurma seviyorum"
  },{
    "trans_language": "Polish",
    "translation": "Kocham lody"
  }]
}]

bensaine09 · April 1, 2023, 5:19am

This worked great for me ! Consistently receiving JSON response.

royjay · April 10, 2023, 4:01pm

Finally, an approach that seems to work, thank you!
Roy

janne.niemensivu · May 4, 2023, 3:38pm

I would like to hear if anyone has experiences in placing instructions in system role content with chatCompletions? My experiences are not too good, so far

montana.burr · May 19, 2023, 7:47pm

I just figured out another prompt. My inputs are different, though, and to be categorized “high”, “medium”, “small”, “extra_small”.

My prompt (relevant part in bold): “Without any comment, return the result in the following JSON format {“high”:[…],“medium”:[…],“small”:[…],“extra_small”:[…]}”

zhihong0321 · May 26, 2023, 8:53am

my app are crucial with a pre-formatted JSON Structure.
which contain not only “reply in text”
but also various system command, and args.

My tips for any1 for need this:

THE POSITION of this JSON Only instruction is the MAIN FACTOR how consistant the GPT will follow.

– As long As this particular instruct is the VERY LAST Part of the entire prompt. you are good to go.
– i placing this just under User input ( as reminder )

This work best for me.

iamflimflam1 · May 26, 2023, 9:38am

I thought this approach from Microsoft was really interesting: GitHub - microsoft/guidance: A guidance language for controlling large language models.

RogerioPagliari · June 13, 2023, 3:07pm

Hey, we were having this problem as well. The way we solved was by adding this “reply in JSON format” in every interaction we had with ChatGPT, not only in the prompt. It seems to be working

bruce.dambrosio · June 13, 2023, 5:28pm

You can also try alphawave (pypi install alphawave), it solves this problem by validating a response.
If the response JSON is surrounded by other text, as is often the case, it will extract the JSON
If there is no valid JSON, it uses the json validator error to provide specific failure ‘feedback’ in a retry.
It also manages the conversation history, so that once the failure is corrected, the ‘feedback’ messages are deleted from history so you don’t waste context space.

typescript and python versions available

josephm.dev · June 17, 2023, 1:56am

OpenAI recently annouced updates to their API that now make it possible to get properly formatted JSON in your response.

Previously, you could do a little “prompt engineering” and get stringified JSON by simply appending “provide your response in JSON format” to the end of the prompt. Though, these responses often included incorrect trailing commas or introductory text (“Here is your recipe in JSON format:”) that led to breaking errors.

I’ve written an explanatory post where I go into detail on how you can update your old prompts with the new parameters to get a JSON response. No links allowed here, but you can search for that article on Medium.

Briefly, you are going to want to first define your JSON Schema object. Then pass this object to the new functions parameter in the ChatCompletion endpoint:

openai.createChatCompletion({
      model: "gpt-3.5-turbo-0613",
      messages: [
        { role: "system", "content": "You are a helpful recipe assistant." },
        { role: "user", content: prompt }],
      functions: [{ name: "set_recipe", parameters: schema }],
      function_call: {name: "set_recipe"}

Look up JSON Schema to make sure you define the schema correctly. It is a bit verbose.

CodyA · June 19, 2023, 2:48am

Fix found!
I have been having the same issue, I even tried triple shot prompting with three examples and no luck it just wouldn’t generate me a JSON without text saying ‘here is your JSON format’. I actually asked chat GPT-4 how to get around this and it found an easy solution.
You basically just define a function that grabs the text between the ‘[’ and ‘]’ brackets and then pass that text off to wherever, for me i’m parsing it into JSON.loads

No imports required.

Here is the example code GPT provided me:
def extract_json_from_string(s):
start = s.find(‘[’)
end = s.rfind(‘]’) + 1
return s[start:end]

…

json_string = extract_json_from_string(response[“choices”][0][“message”][“content”])
playlist = json.loads(json_string)

This has worked every time and i’ll be using it from now on going forward!

michael23 · July 25, 2023, 7:21pm

OpenAI released Function calling which can directly get JSON object as output without us having to ask the model explicitly in the prompt.

Currently, only gpt-4-0613 and gpt-3.5-turbo-0613. support it.

joseicarobc · September 15, 2023, 6:53am

I would like to thank everyone and try to help others who may need something similar, related to responses and response transformations in the “string>JSON” format.

To @bruce.dambrosio, thank you because it was through your post that I discovered Promptrix and AlphaWave (Alpha Wave’s automatic response repair in action - Community - OpenAI Developer Forum).

Regarding my experiences using responses and modifying responses (python + json + DB), what kept me going, in most cases, were splits or regex, and only recently did I decide to change two of the main codes I created and maintain (one of which I use for commercial purposes).

I am using the new gpt3.5 and gpt4 models (OpenAI Platform).

I haven’t been able to update with promptrix + alphawave yet, but I would love to have feedback if anyone is kind enough.

In any case, I revived the post and will soon share my results with promptrix.

stevenic · September 21, 2023, 8:27pm

@joseicarobc are you using the Python or JS versions of Promptrix+Alphawave? I’ve made a number of recent improvements to the JS versions but the Python version is a little lagging. AlphaWave is still pretty much state-of-the-art (in my opinion) with regards to improving the overall reliability of getting structured data back from the model.

bruce.dambrosio · October 25, 2023, 10:05pm

Yup - just to confirm @stevenic’s comments - the python versions, promptrix-py and alphawave-py, are pretty much frozen - at least I’ve stopped updating them, although I will try to fix any bugs reported to the repository.

I’ve forked a variant of alphawave-py, based on new thoughts on how llms might interact with more structured computational resources. but I won’t be ready to share it publically for a bit.

seanfox1989 · October 26, 2023, 9:15am

I had this issue for a while, and although I’m late to the party—here’s how I managed to solve it for a quiz module I was making (or how I’d recommend solving it in your case). I totally get your frustration; the inconsistency can be really annoying, especially when you’re expecting structured data to work with.
This should hopefully help Javascript users
(PS - OpenAi devs… show js some love in the playground please…)

How I Fixed It

Setting Expectations with the “System” Role

I noticed that setting an initial message with the role as “system” helped. This way, you can instruct the model to stick to certain types of responses. For example, I used the following setup:

{
  role: "system",
  content: "You are a machine that only returns and replies with valid, iterable RFC8259 compliant JSON in your responses"
}

This helps set a “systemic” instruction for the model to abide by, and I’ve found that it significantly improves consistency.

Precise Prompt Formatting

Be as specific and clear as possible with your prompt. If you need a specific structure, don’t be shy to explicitly spell it out. For instance:

const prompt = `generate a 4 question advanced and very hard quiz about ${subject} - provide the question, one correct answer and 3 wrong answers in a json array format. The objects will be called question1-4, correct_answer, and wrong_answers`;

Sanity Check the Response

After I get the response, I validate and filter it to ensure that it follows the format I want. This can act as a fail-safe in case the model output deviates:

const content = data.choices[0].message.content;
const parsedContent = JSON.parse(content);
if (Array.isArray(parsedContent)) {
  // Your logic here...
} else {
  console.error("Invalid format: Iterable array should be an array.");
  return null;
}

The Big Picture

Here’s a chunk from my own code that incorporates the above principles:

const generateQuizQuestions = async (apiKey, prompt) => {
  const url = 'https://api.openai.com/v1/chat/completions';
  const headers = {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  };
  const body = JSON.stringify({
    model: 'gpt-3.5-turbo',
    messages: [
      {
        role: "system",
        content: "You are a machine that only returns and replies with valid, iterable RFC8259 compliant JSON in your responses"
      },
      {
        role: 'user',
        content: prompt
      }],
    temperature: 0.7
  });

  const response = await fetch(url, { method: 'POST', headers, body });
  const data = await response.json();

  if (!data.choices || !data.choices[0] || !data.choices[0].message || !data.choices[0].message.content) {
    return null;
  }

  const content = data.choices[0].message.content;
  const parsedContent = JSON.parse(content);

  if (Array.isArray(parsedContent)) {
    return parsedContent.map(q => {
      return {
        question: q.question,
        correct_answer: q.correct_answer,
        options: [q.correct_answer, ...q.wrong_answers].sort(() => 0.5 - Math.random())
      };
    });
  } else {
    console.error("Invalid format: Iterable array should be an array.");
    return null;
  }
};

I hope this helps! Even though you ran into this issue last year, I hope this can be of use to you or anyone else who finds themselves in a similar situation. Cheers!

eli.goykhberg · November 20, 2023, 8:45pm

sweet and for me also inspirational approach Thanks you!

bruce.dambrosio · November 20, 2023, 11:18pm

I have found gpt4-turbo-1106 super talented at repairing json. Separating the generation from the formatting makes the repair pretty cheap.
(self.llm.ask is just a simple wrapper insulating from llm api details)

   def repair_json (self, item):
      #
      ## this asks gpt-4 to repair text that doesn't parse as json
      #

      prompt_text=\
         """You are a JSON expert. The following TextString does not parse using the python JSON loads function.
Please repair the text string so that loads can parse it and return a valid dict.
This repair should be performed recursively, including all field values.
for example, in:
{"item": {"action":"assign", "arguments":"abc", "result":"$xyz"} }
the inner form  {"action"...} should also be parsable as valid json.
Return ONLY the repaired json.

TextString:
{{$input}}
"""
      prompt = [
         SystemMessage(prompt_text),
         AssistantMessage('')
      ]
      response = self.llm.ask(item, prompt, template=GPT4, max_tokens=150)
      if response is not None:
         answer = response
         print(f'gpt4 repair {answer}')
         return answer
      else: return {'gpt4':'query failure'}

MindyGem · November 27, 2023, 4:24am

Hi everyone,
I use to fixing JSON responses.

//delete text before first ‘{’ and delete text after last ‘}’

            let jsonMatch = responseJsonStr.match(/json\s*({.*})/s);
            let fixedJsonStr = jsonMatch ? jsonMatch[1] : responseJsonStr;
            if (fixedJsonStr == undefined) {
              throw new BusinessError('invalid_response')
            }

            const json = JSON.parse(fixedJsonStr)

It works fine.

Jon4411 · December 2, 2023, 6:03am

I’ve found using this before your promot works perfectly every time

Please provide the following information in a plain JSON format without any Markdown or code block formatting

Topic		Replies	Views
I want to get json format response which I can pass using gpt-4 model. Also I want to give my prompt to get json data Prompting gpt-4	14	19158	October 26, 2023
How do I use the new JSON mode? API api	61	178012	January 29, 2024
Valid json every time? Prompting	17	11352	January 3, 2024
Ensure JSON response format API	23	42173	February 19, 2024
Response has valid json but it's nested in broken json Bugs	16	3251	September 9, 2024