Generate JSON array with chat completions

raphael-d · March 19, 2024, 11:14pm

I’m using "response_format": { type: "json_object" } in chat/completions API.
I explicitly prompt to generate an array of objects and provide an array as example. The output is always a single JSON object.

Could you confirm that json_object forces the output to be a single object?
Are ‘json_document’ or ‘json_array’ options you are planning to support?
Any immediate solution to generate a valid JSON array?

Note that in recommendation use cases, we often need arrays with several recommended outputs.
I can generate JSON array using “text” format, but then I have no guarantee that the output is a valid JSON document.
Thanks.

sps · March 20, 2024, 4:01am

Welcome to the dev forum @raphael-d

Intead of prompting for an array of objects, ask for jsonlist with a ‘name’.

e,g, prompt:

"Respond with a description of phrases provided by the user, with valid jsonlist named 'phrases' where every json object has 'description', 'region' (where the phrase is used often) and 'language' attributes."

Response I received:

{
    "phrases": [
        {
            "description": "A casual greeting often used in Australia or the UK, similar to saying 'Hello, friend!'",
            "region": "Australia, United Kingdom",
            "language": "English"
        },
        {
            "description": "A measurement of fuel economy that indicates how many miles a vehicle can travel on one gallon of fuel. Commonly used in the United States and United Kingdom.",
            "region": "United States, United Kingdom",
            "language": "English"
        },
        {
            "description": "A common greeting in Northern Germany, especially in the morning, equivalent to saying 'Good morning!'",
            "region": "Northern Germany",
            "language": "German"
        }
    ]
}

kjordan · March 20, 2024, 9:41am

@raphael-d I’ve also made a lot effort generating json arrays and the conclusion is: DON’T.

The json array can be easily messed up in different cases.

My recommendation is generating a json map instead of a json array. Example:

If you expect ['a', 'b', 'c'], just let the model generate { "0": "a", "1": "b", "2": "c"}.

This is much more stable and reliable based on my experience. I have used this trick in several production apps without any issue.

raphael-d · March 20, 2024, 2:40pm

Thanks for the “trick” @kjordan. I’ll try this approach.
Generating a list and having the result as an array instead of a map sounds like a desired feature.

sps · March 20, 2024, 5:37pm

Yes. This is using the "response_format": { "type": "json_object" }

The prompt is equally important and is mentioned in docs:

When using JSON mode, always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don’t forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.

Topic		Replies	Views
`json_mode` returns no JSON arrays API json-mode	14	8204	January 9, 2025
Ensure JSON response format API	23	45391	February 19, 2024
Guarantee parsable JSON from returned choices API	1	1029	March 3, 2023
Fine tuning models to generate JSON response Prompting codex , chatgpt , fine-tuning , api	6	6096	November 9, 2023
How to get 100% valid JSON answers? Prompting gpt-4 , gpt-35-turbo , chatgpt , api	16	8718	June 11, 2024

Generate JSON array with chat completions

Related topics