API vs non-API results are horribly inaccurate creating JSON objects

Hi,

I am having a tough time getting the API to return back consistent JSON when I provide it some key/value pairs and a JSON template.

These are the models and settings I’ve tried:

    model: "gpt-4",
    //model: "gpt-4-0125-preview",
    //model: "gpt-3.5-turbo",
    //model: "gpt-3.5-turbo-0125",
    temperature: 0.1,
    max_tokens: 512,
    top_p: 1,
    frequency_penalty: 0.75,
    presence_penalty: 0,
  };

My instructions to GPT are straight forward and the values do conform to the JSON, however, the JSON the API generates has issues most of the time, whereas the web GPT returns accurate results and valid JSON every time.

I typically request something like this:
Using this data: ${data}, make it conform to this JSON template. ${templateString}. Do not add any extra notes or comments to the json. Wrap all non null values in standard double quotes.

For example:
A subset of the data may look like this:
fname: john
lname: doe
dob: 01/01/2000

The template would look like this:
{
“firstname” : null,
“lastname”: null,
“dob”: null
}

The results are all over the place…

  • sometimes it mixes straight quotes with curly quotes
  • sometimes it mixes single quotes and double quotes
  • some values it leaves unquoted
  • braces or brackets can sometimes get surrounded by quotes
  • extra commas
  • extra quotes (mixed ones too)
  • id numbers or zip codes starting with 0 are unquoted
    etc…

So even if the JSON is bad, I take that bad JSON and pass it back into the API and tell it to correct any JSON formatting issues and return valid JSON.

The web GPT does it perfectly fine.
The API mangles it even more.

I’m confused and stuck here. I’ve tried a few different models and played with the temperature value to try to get it less ‘creative’. Still not consistent enough where I can depend on it to format some data into simple JSON. I’ve even made all the properties root level to avoid any complex nesting.

Any help or guidance appreciated.

Thanks
A

Hi and welcome to the Forum!

I’d consider a couple of options here:

  1. Use a few-shot approach and provide the model with a couple of examples of the desired JSON in addition to the template you are providing.

  2. Create a fine-tuned model.

In addition to these two options, I’d also suggest some refinements to your prompt itself, e.g. reinforcing the wording that the output must strictly adhere to the defined JSON template and placing the actual data and the JSON template at the end of the prompt or the beginning instead of mingling it with the instructions.

Finally, based on own experience I can say that different models have different quirks when it comes to responding in JSON format. So you may need to differentiate the issues you are experiencing by model and then cater your prompt and/or the examples you are providing to the model in question.

Hi,

Thank you. I will try those suggestions.

A

This is a prompting problem. You are mixing data in with instructions, and likely not using system message in a way that aligns the AI to perform the task in a standardized manner.

Just to give you a start of writing a chat AI to do a task:

system: You are JSONbot, a json data extractor and rewriter, that performs the same task that a simple python script could do on data (if the data format was always the same).

AI task: extract entities from listed data elements, and reproduce them in a valid JSON using these keys: ["firstname", "lastname", "dob"]. If entity data of an item is incomplete or missing, use null JSON data type.

Response: Output will be only the JSON, with no other chit-chat.

user: Extract JSON data from this text:

{my data}

I also just tried to briefly experiment myself. The following prompt worked for me. You’d still have to incorporate some of the other elements.

{
    "model": "gpt-4",
    "messages": [
        {
            "role": "user",
            "content": "You are provided with a data set. Your task is to convert the data set into a defined JSON format. Your output must consist strictly of only the JSON in the specified JSON format. JSON format: [{firstname: value for fname},{lastname: value for lname},{dob: date of birth in date format}; Data: Placeholder for data"
        }
    ]
}

Result

 [{"firstname": "john"}, {"lastname": "doe"}, {"dob": "2000-01-01"}]

When I run the exact same prompt using gpt-4-0125-preview, I instead get the following output:

```json
[
    {"firstname": "john"},
    {"lastname": "doe"},
    {"dob": "01/01/2000"}
]

As you can see, it is fairly similar but has a slightly different response format. You need to adjust your prompt to accommodate for these differences. This is where examples may come in handy or, if absolutely needed, fine-tuning. Personally, I have been experiencing fewer errors when using GPT-4 for JSON output - but that doesn’t mean that you can’t get it to work with the other models.

1 Like

Interesting because I thought it would be as straight forward as the bot is and therefore I treated it as such with my instructions.

Didn’t think (or know) to include the role and content instructions. Will have to try some variations of that.

I didnt include some instructions but I comingled the data first, then instructions, followed by the template. Guess that sequence may not be ideal.

thank you!

How did you go? Any luck with this?

Also, I thought I would add that, generally speaking, the model assumes that the first part of a prompt is the instructions or an overview of the necessary context, though this can also be at the end of the prompt. Pro Tip: you can ask the model how it would respond to or interpret things, which can point you in the right direction (most of the time).

1 Like

You have to tell assistant what exactly it needs to do for example

role: ‘system’,
content: ‘You assist with converting unstructrued data to structured key value json format from provided text. Strictly return response in json without any extra text in beginning or end’

In next object of role user you can give your prompt
role:‘user’,
content: ‘list all the data of users. Your whole response should be in structured key value format with the keys ‘fname’, ‘lname’, ‘email’. Here is the text’: \n ${text}`

Here you will not face ```json in begining and get only parsable json data.
If still not getting correct format than try changing you role user content.

1 Like

try this:

## Provide result in JSON:
{{
“first_name”: <first name extracted or fname>,
“last_name”: <last name extracted or lname>,
“dob”: <date of birth or dob value>
}}

1 Like

For more recent models (e.g. gpt-4-turbo family) you can also pass response_format parameter to your api call:
response_format={"type": "json_object"}

Additionally, I’ve found it helpful to use Pydantic. You can use the following parameter to show an example of what your response format might look like

model_config = ConfigDict(
        json_schema_extra={"my_example_format": "foo"}
)

When sending system / user prompts you can then include your pydantic models json schema as an example to the model using

your_model.model_json_schema()

This will include your example in the output.

1 Like

Thanks everyone. I’ll give these a go when I get the chance. Since a lot of the work is POC, I shuffle between this and other tasks so I haven’t gotten around to playing with it much.

1 Like