When we try to instruction ChatGPT (gpt-3.5-turbo) in the prompts to generate HTML or JSON or Markdown output, often the output is really broken. HTML tables will have badly broken structure, JSON results will not decode with PHP json_decode() because of null, non quoted numbers, etc.
Is anyone having similar issues? We have to do a ton of post processing to make JSON or HTML work and even then many times our efforts will fail as the structure is too broken.
I wish there was a output content-type in the API call so we can set things like:
When wanting JSON response: Content-Type: application/json
For HTML: Content-Type: text/html
Default: Content-Type: text/plain
If the output is not easily usable, it kinds of makes the API integration much more work than it needs to be.
What do you all think?
3.5 is not very good at coding. Even 4 needs some quality control. And the output is a function of how well the prompt and/or the system message is.
Did you try to include few-shot examples?
This can help a lot.
IDK about your process but
gpt-3.5-turbo works fine for JSON:
Yes, for JSON we always tell it how to provide the results. But still messes up badly from time to time and we have to do lot of “fixing” using other tools
We are staying with 3.-5-Turbo because of the cost issues. GPT-4 is way too expensive to offer free-tier services. We are hoping OpenAI will make GPT-4 pricing change so that we can offer its superior results to users.
I’ve put the data models into the system message of GPT4 and tell it to generate json that can be posted to that model, it creates the perfect json everytime. But doing the same thing with gpt3.5 does not work.
With GPT3 do not use the ‘system’ role. With my experiences, it is not the best option
I haven’t had any issues with GPT-3.5 producing JSON. Even does Cypher queries well with a little guidance.
But, I saw yesterday an interesting json parser that may help?
Faced the same issue today. JSON output with ‘gpt-3.5-turbo-0301’ is variable despite giving clear instructions on how to process. Maybe my prompt is too complicated ?
var prompt = “From the job in this JSON… “+ JSON.stringify( jobObj ) +” … extract the proper noun or software platform from the attribute ‘job_title’ and give a new output JSON object with attribute ‘software’ to contain the software platform found in the ‘job_title’ .”
prompt += " From the input JSON parse attribute ‘job_years_of_experience’ give me another numeric attribute called ‘years’ … intern : 0, junior : 1, senior : 3, etc."
prompt += " From the input JSON parse attribute ‘job_title’ give me the job role as output attribute ‘role’ … if role is developer, tester etc. "
prompt += "If the city is specified in attribute ‘job_in_city’ give me the 2 digit country code as an attribute called ‘country’ . E.g. Lisbon is ‘PT’ "
prompt += “Your output will only be a JSON object no code needed. With the 4 attributes mentioned (software, years, role, country) and nothing else. Do not output the Input JSON. Do not output the explanation.” ;
It seems to work but sometimes creates an output with Input JSON and explanation. So had to write a parser to handle that.
The trick is “show don’t tell” - if you want a JSON object, show it the object rather than describe it in the prompt. Same goes for markdown, tables, etc.
Consider this JSON object
<<THE JSON OBJECT>>
Return a new RFC compliant JSON object without deviation or explanation in the following format.
"software": "the software platform derived from the ‘job_title' field",
"years": "the number of years experience required derived from 'job_years_of_experience' where 0 is 'intern', less than 3 is 'junior', and 3 or more is 'senior' ",
"role": "the role of the developer determined from the job_title field",
"country": "the ISO country code derived from the field job_in_city",
The JSON response:
JSON errors by GPT are more tolerable than its HTML output. We decided not to ask it to generate HTML output due to the token cost and instead we ask it to generate Markdown and then run a Markdown to HTML conversion on our end.
Excellent way to describe the solution! “Show don’t tell” Love it! Thank you for the insight. Very helpful.
Yes figured that out. JSON sample in and update these properties were the best way to do this. Forgot to update here. Thanks Matt.
Of course today they announced Function Calling with Completions API which passes JSON in and out and out so will probably rewrite code to use that instead.