HTML / JSON / Markdown Output Generation is Very Clunky or out right broken

evoknow · May 20, 2023, 6:22am

When we try to instruction ChatGPT (gpt-3.5-turbo) in the prompts to generate HTML or JSON or Markdown output, often the output is really broken. HTML tables will have badly broken structure, JSON results will not decode with PHP json_decode() because of null, non quoted numbers, etc.

Is anyone having similar issues? We have to do a ton of post processing to make JSON or HTML work and even then many times our efforts will fail as the structure is too broken.

I wish there was a output content-type in the API call so we can set things like:

When wanting JSON response: Content-Type: application/json
For HTML: Content-Type: text/html
Default: Content-Type: text/plain

If the output is not easily usable, it kinds of makes the API integration much more work than it needs to be.

What do you all think?

patrick.g.olsen · May 20, 2023, 6:46am

3.5 is not very good at coding. Even 4 needs some quality control. And the output is a function of how well the prompt and/or the system message is.

PriNova · May 20, 2023, 8:58am

Did you try to include few-shot examples?
This can help a lot.

sps · May 20, 2023, 10:57am

Hi @evoknow

IDK about your process but gpt-3.5-turbo works fine for JSON:

evoknow · May 20, 2023, 7:02pm

Yes, for JSON we always tell it how to provide the results. But still messes up badly from time to time and we have to do lot of “fixing” using other tools

evoknow · May 20, 2023, 7:03pm

We are staying with 3.-5-Turbo because of the cost issues. GPT-4 is way too expensive to offer free-tier services. We are hoping OpenAI will make GPT-4 pricing change so that we can offer its superior results to users.

patrick.g.olsen · May 20, 2023, 7:19pm

I’ve put the data models into the system message of GPT4 and tell it to generate json that can be posted to that model, it creates the perfect json everytime. But doing the same thing with gpt3.5 does not work.

PriNova · May 20, 2023, 7:35pm

With GPT3 do not use the ‘system’ role. With my experiences, it is not the best option

anon10827405 · May 20, 2023, 7:40pm

I haven’t had any issues with GPT-3.5 producing JSON. Even does Cypher queries well with a little guidance.

But, I saw yesterday an interesting json parser that may help?

shawn1 · May 20, 2023, 9:47pm

Faced the same issue today. JSON output with ‘gpt-3.5-turbo-0301’ is variable despite giving clear instructions on how to process. Maybe my prompt is too complicated ?

var prompt = “From the job in this JSON… “+ JSON.stringify( jobObj ) +” … extract the proper noun or software platform from the attribute ‘job_title’ and give a new output JSON object with attribute ‘software’ to contain the software platform found in the ‘job_title’ .”
prompt += " From the input JSON parse attribute ‘job_years_of_experience’ give me another numeric attribute called ‘years’ … intern : 0, junior : 1, senior : 3, etc."
prompt += " From the input JSON parse attribute ‘job_title’ give me the job role as output attribute ‘role’ … if role is developer, tester etc. "
prompt += "If the city is specified in attribute ‘job_in_city’ give me the 2 digit country code as an attribute called ‘country’ . E.g. Lisbon is ‘PT’ "
prompt += “Your output will only be a JSON object no code needed. With the 4 attributes mentioned (software, years, role, country) and nothing else. Do not output the Input JSON. Do not output the explanation.” ;

It seems to work but sometimes creates an output with Input JSON and explanation. So had to write a parser to handle that.

mattgscox1 · May 22, 2023, 6:19pm

The trick is “show don’t tell” - if you want a JSON object, show it the object rather than describe it in the prompt. Same goes for markdown, tables, etc.

Consider this JSON object
<<THE JSON OBJECT>>

Return a new RFC compliant JSON object without deviation or explanation in the following format.
{
"software": "the software platform derived from the ‘job_title' field",
"years": "the number of years experience required derived from 'job_years_of_experience' where 0 is 'intern', less than 3 is 'junior', and 3 or more is 'senior' ",
"role": "the role of the developer determined from the job_title field",
"country": "the ISO country code derived from the field job_in_city",
}

The JSON response:

evoknow · May 22, 2023, 6:34pm

JSON errors by GPT are more tolerable than its HTML output. We decided not to ask it to generate HTML output due to the token cost and instead we ask it to generate Markdown and then run a Markdown to HTML conversion on our end.

ishpradhan · June 9, 2023, 4:39am

Excellent way to describe the solution! “Show don’t tell” Love it! Thank you for the insight. Very helpful.

shawn1 · June 14, 2023, 9:16am

Yes figured that out. JSON sample in and update these properties were the best way to do this. Forgot to update here. Thanks Matt.
Of course today they announced Function Calling with Completions API which passes JSON in and out and out so will probably rewrite code to use that instead.

shawn1 · December 1, 2023, 11:54am

Quick update … issue seems to be resolved with JSON feature … Thx OpenAI team

response_format : { type: “json_object” }

Topic		Replies	Views
Response has valid json but it's nested in broken json Bugs	16	3931	September 9, 2024
Ensure JSON response format API	23	45732	February 19, 2024
Valid json every time? Prompting	17	12103	January 3, 2024
Inconsistent and invalid JSON response API	8	6523	June 11, 2024
Fine tuning models to generate JSON response Prompting codex , chatgpt , fine-tuning , api	6	6156	November 9, 2023

HTML / JSON / Markdown Output Generation is Very Clunky or out right broken

Related topics