Valid json every time?

benjaminrigby · January 2, 2023, 10:28pm

Hi - i’m wondering if it’s possible to tell GPT to output valid json everytime. I’ve tried a number of prompts but still occasionally get errors where it’ll insert an invalid element into the json. My prompt is like:

Give this item a name and then elaborate on the description. Put results into a valid json object with keys: "item_name ", “item_description”

benjaminrigby · January 3, 2023, 9:18pm

I just ended up writing a snippet to extract the json (where “output” is a param that i send in)

  if (output == 'json'){
    let start = text.indexOf('{');
    let end = text.lastIndexOf('}')+1;
    let json = text.substring(start,end);
    myresult = JSON.parse(json);
  }

PaulBellow · January 3, 2023, 9:27pm

Nice. Thanks for coming back to share.

davidtorres · January 25, 2023, 6:07am

I’m doing something similar, except I’m adding in the prompt to ‘respond in JSON format, surrounded by “```”’, then I’m extracting the text between triple-ticks instead and running JSON.parse on it. I wrapped the parse in a try/catch and I’m retrying the completion if parsing fails. I still occasionally get invalid JSON, but I figured due to the specific nature this might be a good candidate for fine-tuning once I build up enough data.

k0in · January 25, 2023, 9:44pm

What you can also try if feasible, use edit endpoint give it the json with your structure as input and prompt your edit with something like “fill in this json”, I use this as a universal code parser for example where i extract function signatures, parameters.
In my tests it seems really reliable.

Good luck

awaken · January 25, 2023, 10:43pm

As alternative, I found that asking for comma-separated CSV gives me the most consistent results, but definitely need some sort of basic post-processing to remove occasional unwanted characters. Also, CSV formatting is minimal compared to json format, so it could save a tiny bit of tokens every request, that could add up over time.

ruby_coder · January 26, 2023, 1:16pm

You might try changing your prompt to something like this, or something similar:

Create key, value pairs from the following data and format the key, value pairs using JSON notation with keys: "item_name ", “item_description”

If you provide your exact data required to format, I can work on engineering a prompt for you.

Qlaut · January 31, 2023, 1:35pm

Hi david,

would you be so kind so show what your exactly prompt looks like? Im also battling to get json output and i dont seem to be able to succeed on this.

Cheers!

christian.ribe · February 8, 2023, 4:45pm

Having the same issue too.
OpenAI chatbot suggested I use this in my prompt.
“Prompt(encoding=True, validation=True, sanitation=True)”

The results are better but I still get invalid json from time to time.

mkumar · March 7, 2023, 8:40pm

Will setting the temperature to 0 gives stable responses, From time to time the model is inconsistent with generating output in a desired format in this case JSON. I achieved success only with 1 out of 5 tries where I get valid JSON. What can improve the reliability and consistency of the output?

stevenic · March 8, 2023, 12:24am

This is the basic technique I’m using as well…

ruby_coder · March 8, 2023, 12:29am

See:

stevenic · March 8, 2023, 3:13am

So I always show the model an example of the JSON I want it to generate and I have yet to see it generate bogus JSON when doing that. I have gotten bogus JSON in the past but it was before I started including an example of the JSON I wanted back in the prompt…

I should also mention that I’m running my prompts with a temperature as high as 0.9 and I can’t recall a single bad JSON output over the last several hundred model calls.

christian.ribe · March 9, 2023, 8:16pm

I need to try this out, since my outputs have been a bit inconsistent.
Thanks

stevenic · March 9, 2023, 8:23pm

Another add to this… Should the model not return JSON, I convert it’s response to JSON and then write that to the conversation history. I’ve found that should the model ever see in the conversation history a response that violated one of the rules you gave it, the more likely it is to a) always violate that rule, and b) potentially violate other rules…

RonaldGRuckus · March 9, 2023, 8:24pm

Although it’s very appealing to use a language model for object creation, I would really try to avoid doing it. Unless the model is completely controlled by you, and each item can be vetted reviewed. If it’s a part of a pipeline, or immediately deposited to a server, it could completely break your system.

The reason I say this is because it inherently can make adjustments, and depending on how long your object is, it can start to make mistakes. This isn’t even considering what happens when the expected data is different. This is all assuming you are the sole user. Depending on how automated your pipeline is, it can completely ruin the process and possibly corrupt your database

I was doing this for a while, with a very small error rate (1% - 5%). However I have found better success in token classification and using logic to create my object.

benjaminrigby · March 27, 2023, 3:53pm

Solved problem in GPT4:

Topic		Replies	Views
Fine tuning models to generate JSON response Prompting codex , chatgpt , fine-tuning , api	6	5635	November 9, 2023
How to get 100% valid JSON answers? Prompting gpt-4 , gpt-35-turbo , chatgpt , api	16	6962	June 11, 2024
Response has valid json but it's nested in broken json Bugs	16	3327	September 9, 2024
Fine-tuning a Language Model to Generate dinamically specific JSON Structure without Prompting API openapi , fine-tuning , api	13	4001	May 24, 2023
Inconsistent and invalid JSON response API	8	5924	June 11, 2024

Valid json every time?

Related topics