Setting response_format to { type: "json_object" }
seems only returns singular {"key1": "value1", "key2": "value2", ...}
JSON object than array like [{"key1": "value1", "key2": "value2"}, {"key1": "value3", "key2": "value4"}, ...]
, unless the array is encapsulated as a value within a key-value pair. This persists even the prompt explicitly requests the output to be a JSON array. Although it can be tackled by parsing the JSON object to extract the array, thought it’s worth noting for clarification. Would be great if future API includes direct JSON array responses to save some extra parsing.
I’m also seeing this issue - I need it to return [{"key1": "value1"}, {"key1": "value1"} ...]
and instead it just returns a single item, not contained within an array e.g.
{"key1": "value1"}
It also seems to max out tokens.
Turning off json_object
makes it work fine, but then we’re back to the model returning text before and after the object.
The issue happens on both the gpt-3.5 and gpt-4 models
Same here, I cannot force the model to return an array as response. Always return an object with at least one key.
I’m running into a similar problem.
I’ve analysed tens of thousands of items using GPT4, mapping text to json.
I like to analyse multiple items at once, to save on prompt input tokens (analysing a list at a time).
However, I was trying to do the same with the new json_mode but it doesn’t like to return an array of jsons.
It rather returns a million white spaces, or just the first item in the list.
It works for performing a single analysis. I guess it’s constrained to return a dictionary format, not a list.
Function_calling has never returned arrays on the outer layer so it’s the same with the new json_mode.
You can wrap it like this:
{‘array’:}
But I prefer:
{‘1’: {}, ‘2’: {}, …}
This tends to be more reliable (prompts outputting arrays can randomly miss elements and have index errors) but it’s a bit more trouble to set it up with n_elements instead of a fixed amount.
From my experience, even if I forced model to return json array with prompts, it’s very unstable and the array could be easily messed up.
Like @msp26 mentioned, object with the string index is way more stable and usable in production.
I would definitely love to see OpenAI resolve this.
I have an app that need to return an array of objects, but as pointed out response_format: {type: “json_object”} only returns the first object of the array. It’s making my app a little tricky to deal with, having to sometimes handle random text prior to the array.
Hi and welcome to the Developer Forum!
You just need top strip the Markdown header and footer from the response:
response.content = response.content.replace(/```json\n?|```/g, '');
I found a solution!
If you want to return a JSON array you have to make sure the top level item of the JSON response is an object.
For example here is my system prompt:
Provide JSON format as follows, along with the definition of each field:
{
"offers": [
{
"description": "...", # This is the product name, e.g. Coronita.
"age": "...", # Age of the spirit or wine, e.g. 7
"edition": "...",
"vintage": "...",
"release_year": "..."
}
]
}
By putting the array inside the offers object at the top level, my responses are coming back in an array every time!
Here is my chat request with the response_format set:
const completion = await openai.chat.completions.create({messages: message.messages, model: OPENAI_MODEL, response_format: {type: "json_object"}})
Look, I have something that works
{
"response_format": {"type": "json_object"},
"messages": [
{
"role": "system",
"content": "Arrange the rooms from most pleasant to least pleasant to work in. Always provide your result in JSON format."
},
{
"role": "user",
"content": "I'm looking for the best place to work, I have a choice between 4 rooms:\n - Room Studio: {\"humidity\":38.54,\"temperature\":26.43,\"pressure\":998.67}, Capacity: 6- Room Cockpit: {\"humidity\":40.86,\"temperature\":26.37,\"pressure\":1000.09}, Capacity: 16- Room Loft: {\"humidity\":34.45,\"temperature\":28.23,\"pressure\":1001.61}, Capacity: 8- Room Hall: {\"humidity\":44.3,\"temperature\":24.92,\"pressure\":999.69}, Capacity: 40. Calculate the wellness value and provide a justification."
}
],
"functions": [
{
"name": "calculate_wellness_value",
"description": "Calculate the wellness value and provide a justification",
"parameters": {
"type": "object",
"properties": {
"rooms": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Name": {
"type": "string",
"description": "The name of the room"
},
"temperature": {
"type": "number",
"description": "The room's temperature in degrees Celsius"
},
"wellnessvalue": {
"type": "string",
"description": "The well-being score calculated for this room from 0 to 100"
},
"justification": {
"type": "string",
"description": "The justification of the well-being score calculation in one sentence"
}
},
"required": ["name", "temperature", "wellnessvalue", "justification"]
}
}
},
"required": ["rooms"]
}
}
],
"function_call": "auto",
"temperature": 0.7
}
Amazing
I’m surprised why this hasn’t been accepted.
U genius
this is the actual solution, makes total sense - fyi gpt4 outputted arrays just fine (fabricating the name of the outer element, but otherwise working just fine) - this made it work for 3.5 and makes total sense why it works.
#geniuuus
I also added a json schema to my prompt and have been 100% ever since.