Getting response data as a fixed & Consistent JSON response

Hi,
Apologies if this is a stupid question! I have been playing with the API and trying to get the output it gives me returned to my project in a consistent way, so that I can always read a given response from the “choices” part of the returned object.

However, it appears reasonable frequently that OpenAI ignores my request or gives a nasty spin on it that messes up the code I use in my app to process it.

e.g
Give me 5 news headlines from January 2020 from the UK with the date and a headline.
Return these statements as a JSON Object with the structure {“Headline”:[“Description”:“string”,“Date”:“date”]}. Do not return any non-json text or numbering.

7 out of 10 times the engine understands and respects the JSON I am asking for, but 30% of the time is adds text in before the object or it does not follow my request for a specific JSON object.

An example of it gone wrong is below. Of course without being able to retrieve data in a consistent fashion makes coding for the receipt of it next to impossible!

Is there a way or known format for forcing the API to return the data in a specific fashion with consistency?

e.g of the object returned gone wrong.
{
“Headline1”: { “Description”: “Britain’s Boris Johnson starts Brexit 2020 with risks of more strife”, “Date”: “January 01, 2020” },
“Headline2”: { “Description”: “Crowds flock to Westminster as Johnson promises to deliver Brexit”, “Date”: “January 05, 2020” },
“Headline3”: { “Description”: “Coronavirus: Cases rise after first case confirmed in the UK”, “Date”: “January 31, 2020” },
“Headline4”: { “Description”: “Boris Johnson hails start of new ‘more prosperous’ Britain as Brexit deal takes hold”, “Date”: “January 01, 2020” },
“Headline5”: { “Description”: “Boris Johnson warns Britain must ‘keep calm and carry on’ as Brexit awaits”, “Date”: “January 01, 2020” }
}

2 Likes

As so often in coding, I have maybe partially answered my own question.
But I would still appreciate those with more experiences feedback.

I have found not putting the “String” in “” gives a more consistent result and also capitalising the S in sting!

That also did not work in this case until I changed the request to “News Stories”. Only then did it give me an object with “Headline” and then an array of 5 stories.

Give me 5 news stories from January 2020 from the UK with the date and a headline.
Return these statements as a JSON Object with the structure {“Headline”:[“Description”:String,“Date”:Date]}. Do not return any non-json text or numbering.

So I guess the message is name your object definitions differently from the text you are returning, and define the object using “Name”:String not “Name”:“String”!

1 Like

Here is a prompt which works fine to me:

const promptText = `Given the article below, create a JSON object which enumerates a set of 5 child objects.                       
                        Each child object has a property named "q" and a property named "a".
                        For each child object assign to the property named "q" a question which has its answer in the article 
                        and to the property named "a" a short answer to this question.
                        The resulting JSON object should be in this format: [{"q":"string","a":"string"}].\n\n
                        The article:\n
                        ${textToUse}\n\n
                        The JSON object:\n\n`;

Just replace ${textToUse} with a text.

2 Likes

I have such common issue where JSON data is not consistent.

5 Likes

The prompt is very important to getting consistent results. Show it exactly what you want, rather than tell. The following approach works for me, just as a fun example, of how to get it to dynamically generate and consistently format completions in JSON

const prompt = `
pretend to be an expert child behavioural researcher.
create a valid JSON array of objects for translating baby speak into English following this format:

[{"baby": "sound the baby makes",
"volumeDb": "how loud is the sound, decibels as a floating-point number",
"timeMin": "how long the sound is made, minutes with 2 decimal places",
"meaning": "what the baby might be trying to communicate",
"confidencePct": "certainty of meaning, percent as an integer,
"response": "what sound the parent should reply with"}]

The JSON object:
`.trim()

This gives a consistent JSON format even with a temperature of 1.

5 Likes

I still get ugly JSON using your prompt structure (I use ChatGPT in french).

I created a repo to share some regexes. But if you have a better approach, I’d appreciate : GitHub - stouch/chatgpt-json-cleaner: Some regexes and string manipulation to clean the chaotic JSON produced by ChatGPT (french prompt)

Thanks.

2 Likes

I’m surprised it hasn’t been mentioned yet, but those news headlines weren’t actually retrieved, but hallucinated; it does a great job looking real, and sometimes, by chance, they are.

Secondly, why bother using a language model for something that can be accomplished with logic?

You can easily connect to any of these APIs and just simply return the JSON.
I feel like this is like building a bridge out of twigs, next to an actual bridge.

If NLP is a critical component you should consider fine-tuning an entity extraction model. GPT will hallucinate and return broken/incorrectly structured JSON. A lot.

1 Like

In many cases, it’s useful to request ChatGPT outputing in JSON.

Whatever the reason

1 Like

Yes, in many cases it is.

It’s always important to determine which stack will work the best for your application. In this case, GPT is not.

This has not been mentioned here, but ChatGPT outputs some ugly JSON oftenly : missing quotes, wrong quote (`, etc), missing keys etc. (I request it in french, but I guess you have the same issues in english ?). That’s why this thread interested me in the first place. But actually it was not about it.

Yes I have the same issue in English. Not too often, maybe 2-5% of all my objects return broken. Depending on the structure of information that it’s reading.

It can also add its own information, such as extra nested objects / arrays. Like you said, there’s definitely uses for it though; I’m happy with the results and is still plenty worth the cleanup.

In your case, have you tried the same in English? I haven’t practiced too much in other languages, but I do notice slightly different answers in Spanish.

1 Like

I’ll try to request in english if it gives us better results. But anyway we need to traduce json values in french then, so this 'll lead other issues on traduction process.

In my case, in the context of my requests, it’s about 15% of JSON arrays produced which are totally broken. I request array of objects containing a date, multiline text, place, subject etc. It’s “just” a simple array of objects ({key: string}).

I just made a simple test in english and I had 1/4 ugly broken JSON.

For example, here is my prompt :

"Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\", 
\"release_date\": 
\"Release date in France, formatted as YYYY-MM\", 
\"subject\": \"Book subject\", 
\"characters\": \"3 characters separated with linebreaks and tirets\"
}]
The JSON object is:"

Result:
Screenshot from 2023-02-26 23-39-36

The result is often broken because of “release_date” format, bad-quotes,

Then I made a more complcated request :

"Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\", 
\"release_date\": \"Release date in France, formatted as YYYY-MM\", 
\"subject\": \"Book subject\", 
\"location\": \"Place where lived the author\", 
\"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"
}]

The JSON object is:"

The result becomes messy with json property keys without quotes etc.

I’ll tell you my results in english , not sure if we gonna test but I’ll keep you in touch.

3 Likes

Hi @stouch

I tested your prompt above and got a valid JSON completion:

I slightly changed your prompt @stouch and got a valid JSON completion for your more “complcated request”.

In my test prompt, I instructed the API to not escape double quotes in the output, like this: Do not escape the double quotes in the output:

Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\", 
\"release_date\": \"Release date in France, formatted as YYYY-MM\", 
\"subject\": \"Book subject\", 
\"location\": \"Place where lived the author\", 
\"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"
}]

Do not escape the double quotes in the output: The JSON object is:

Hope this helps

:slight_smile:

2 Likes

This was my full payload :

{
	"model": "text-davinci-003",
	"prompt": "Create a valid JSON array of objects to list famous french books :\n[{\"title\": \"Book title\", \"release_date\": \"Release date in France, formatted as YYYY-MM\", \"subject\": \"Book subject\", \"location\": \"Place where lived the author\", \"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"}]\nThe JSON object:",
  "max_tokens": 600,
  "temperature": 0.2,
	"top_p": 1,
	"frequency_penalty": 1,
	"presence_penalty": 0
}

FTR : I updated the regexes that fix our french output in 9/10 of our cases : chatgpt-json-cleaner/index.php at main · stouch/chatgpt-json-cleaner · GitHub

That is definitely not helping you!

3 Likes

Indeed, this parameter impacts negatively JSON structure output… We’ll try to avoid to use it. Thanks.

I’d be very interested in knowing why it does impact JSON structure actually ? PaulBellow

I facing same issue with chatGPT’s API. it gives unwanted string in between the JSON response. therefore i not able to process the json response sometimes!

Noob Question Alert!
So in ChatGPT API, the prompt "messages": [{"role": "user", "content": "Hello!"}] in this format. So within the “content” how can I give multiple valid JSON key-value pairs? like in your above example? This solution might work well for regular completion API, but for the ChatGPT 3.5 model how to show the model with a JSON example to make it consistently output valid JSON?

[
   {"role": "user", "content": "Hello!"},
   {"role": "user", "content": "Hello Again!"},
   {"role": "user", "content": "Hello Three Times!"},
   {"role": "user", "content": "Hello Forever!"},
]

HTH

:slight_smile: