Generate JSON array with chat completions

I’m using "response_format": { type: "json_object" } in chat/completions API.
I explicitly prompt to generate an array of objects and provide an array as example. The output is always a single JSON object.

Could you confirm that json_object forces the output to be a single object?
Are ‘json_document’ or ‘json_array’ options you are planning to support?
Any immediate solution to generate a valid JSON array?

Note that in recommendation use cases, we often need arrays with several recommended outputs.
I can generate JSON array using “text” format, but then I have no guarantee that the output is a valid JSON document.
Thanks.

2 Likes

Welcome to the dev forum @raphael-d

Intead of prompting for an array of objects, ask for jsonlist with a ‘name’.

e,g, prompt:

"Respond with a description of phrases provided by the user, with valid jsonlist named 'phrases' where every json object has 'description', 'region' (where the phrase is used often) and 'language' attributes."

Response I received:

{
    "phrases": [
        {
            "description": "A casual greeting often used in Australia or the UK, similar to saying 'Hello, friend!'",
            "region": "Australia, United Kingdom",
            "language": "English"
        },
        {
            "description": "A measurement of fuel economy that indicates how many miles a vehicle can travel on one gallon of fuel. Commonly used in the United States and United Kingdom.",
            "region": "United States, United Kingdom",
            "language": "English"
        },
        {
            "description": "A common greeting in Northern Germany, especially in the morning, equivalent to saying 'Good morning!'",
            "region": "Northern Germany",
            "language": "German"
        }
    ]
}

@raphael-d I’ve also made a lot effort generating json arrays and the conclusion is: DON’T.

The json array can be easily messed up in different cases.

My recommendation is generating a json map instead of a json array. Example:

If you expect ['a', 'b', 'c'], just let the model generate { "0": "a", "1": "b", "2": "c"}.

This is much more stable and reliable based on my experience. I have used this trick in several production apps without any issue.

1 Like

Thanks for the “trick” @kjordan. I’ll try this approach.
Generating a list and having the result as an array instead of a map sounds like a desired feature.

Yes. This is using the "response_format": { "type": "json_object" }

The prompt is equally important and is mentioned in docs:

  • When using JSON mode, always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don’t forget, the API will throw an error if the string "JSON" does not appear somewhere in the context.
1 Like