const promptText = `Given the article below, create a JSON object which enumerates a set of 5 child objects.
Each child object has a property named "q" and a property named "a".
For each child object assign to the property named "q" a question which has its answer in the article
and to the property named "a" a short answer to this question.
The resulting JSON object should be in this format: [{"q":"string","a":"string"}].\n\n
The article:\n
${textToUse}\n\n
The JSON object:\n\n`;
The prompt is very important to getting consistent results. Show it exactly what you want, rather than tell. The following approach works for me, just as a fun example, of how to get it to dynamically generate and consistently format completions in JSON
const prompt = `
pretend to be an expert child behavioural researcher.
create a valid JSON array of objects for translating baby speak into English following this format:
[{"baby": "sound the baby makes",
"volumeDb": "how loud is the sound, decibels as a floating-point number",
"timeMin": "how long the sound is made, minutes with 2 decimal places",
"meaning": "what the baby might be trying to communicate",
"confidencePct": "certainty of meaning, percent as an integer,
"response": "what sound the parent should reply with"}]
The JSON object:
`.trim()
This gives a consistent JSON format even with a temperature of 1.
I’m surprised it hasn’t been mentioned yet, but those news headlines weren’t actually retrieved, but hallucinated; it does a great job looking real, and sometimes, by chance, they are.
Secondly, why bother using a language model for something that can be accomplished with logic?
You can easily connect to any of these APIs and just simply return the JSON.
I feel like this is like building a bridge out of twigs, next to an actual bridge.
If NLP is a critical component you should consider fine-tuning an entity extraction model. GPT will hallucinate and return broken/incorrectly structured JSON. A lot.
This has not been mentioned here, but ChatGPT outputs some ugly JSON oftenly : missing quotes, wrong quote (`, etc), missing keys etc. (I request it in french, but I guess you have the same issues in english ?). That’s why this thread interested me in the first place. But actually it was not about it.
Yes I have the same issue in English. Not too often, maybe 2-5% of all my objects return broken. Depending on the structure of information that it’s reading.
It can also add its own information, such as extra nested objects / arrays. Like you said, there’s definitely uses for it though; I’m happy with the results and is still plenty worth the cleanup.
In your case, have you tried the same in English? I haven’t practiced too much in other languages, but I do notice slightly different answers in Spanish.
I’ll try to request in english if it gives us better results. But anyway we need to traduce json values in french then, so this 'll lead other issues on traduction process.
In my case, in the context of my requests, it’s about 15% of JSON arrays produced which are totally broken. I request array of objects containing a date, multiline text, place, subject etc. It’s “just” a simple array of objects ({key: string}).
Extract and categorize the information below using the following JSON structure:
<|ENDSEG|>
{
“data”: [ {“nom”: “”, “âge”: 0} , {“nom”: “”, “âge”: 0} ]
}
<|ENDSEG|>
Info: [INFO]
<|ENDSEG|>
{
“data”: [{ “nom”: [END]
Not actually converting the results from french to english. I really don’t think it’ll help much, but it’s worth a shot. May return some interesting results that I’d love to know myself.
I just made a simple test in english and I had 1/4 ugly broken JSON.
For example, here is my prompt :
"Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\",
\"release_date\":
\"Release date in France, formatted as YYYY-MM\",
\"subject\": \"Book subject\",
\"characters\": \"3 characters separated with linebreaks and tirets\"
}]
The JSON object is:"
Result:
The result is often broken because of “release_date” format, bad-quotes,
Then I made a more complcated request :
"Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\",
\"release_date\": \"Release date in France, formatted as YYYY-MM\",
\"subject\": \"Book subject\",
\"location\": \"Place where lived the author\",
\"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"
}]
The JSON object is:"
The result becomes messy with json property keys without quotes etc.
I’ll tell you my results in english , not sure if we gonna test but I’ll keep you in touch.
Thanks for trying. Are your escape characters used for the prompting? Where did they come from?
Also, have you tried without setting defaults such as | Title: Book Title | ? In my experience it’s better to only express the datatype and nothing else.
But yes, the issue that you’re running into (Something being formatted incorrectly, like a datetime being formatted as a integer) is something I’ve noticed as well.
I think it’s fair to always accept a less than perfect result from a language model as it’s not based on simple logic.
So the question is : is it better to use it, and clean the result, or find another more reliable method? Which comes back to your point. Depends on the situation.
I slightly changed your prompt @stouch and got a valid JSON completion for your more “complcated request”.
In my test prompt, I instructed the API to not escape double quotes in the output, like this: Do not escape the double quotes in the output:
Create a valid JSON array of objects to list famous french books :
[{
\"title\": \"Book title\",
\"release_date\": \"Release date in France, formatted as YYYY-MM\",
\"subject\": \"Book subject\",
\"location\": \"Place where lived the author\",
\"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"
}]
Do not escape the double quotes in the output: The JSON object is:
{
"model": "text-davinci-003",
"prompt": "Create a valid JSON array of objects to list famous french books :\n[{\"title\": \"Book title\", \"release_date\": \"Release date in France, formatted as YYYY-MM\", \"subject\": \"Book subject\", \"location\": \"Place where lived the author\", \"characters\": \"3 characters and their characteristics, separated with linebreaks and tirets\"}]\nThe JSON object:",
"max_tokens": 600,
"temperature": 0.2,
"top_p": 1,
"frequency_penalty": 1,
"presence_penalty": 0
}
I facing same issue with chatGPT’s API. it gives unwanted string in between the JSON response. therefore i not able to process the json response sometimes!
Noob Question Alert!
So in ChatGPT API, the prompt "messages": [{"role": "user", "content": "Hello!"}] in this format. So within the “content” how can I give multiple valid JSON key-value pairs? like in your above example? This solution might work well for regular completion API, but for the ChatGPT 3.5 model how to show the model with a JSON example to make it consistently output valid JSON?