We are using structured outputs in production with GPT-4o-2024-08-06 and seem to be encountering an issue where often one of the objects within the structured output will just return an infinite number of \n newline characters until the max token limit is hit.
Found that this seemed to be an issue in the JSON_Mode, wondering if anyone else was having the same issue and had found a solution?
The difference is with the latter it uses constrained decoding of tokens so it adheres to your supplied schema, and the probability of getting these weird repeated tokens should be very lowâŚ
Thatâs a problem thatâs been seen since before day 1 with the introduction of a JSON-mode model, activated by response_format: json_object.
If you donât tell exactly what and how to produce anyway, with as much effort as it would take ANY AI to understand what your API takes as JSON input, it will go nuts.
Thatâs why OpenAI makes you include the word âJSONâ, but you only need to say ânever produce JSONâ to get a repeating loop of tabs or newlines.
The actual âstructured outputâ of higher level enforcement requires a JSON schema be sent with the different setting for that: json_schema
The schema is placed in AI context, in a format similar to a function specification, where the AI should be able to follow it just like it would follow your own instruction.
@_j interesting stuff. Kind of a black box if you ask me.
We pass in our JSON Schema with Descriptions and when we added the instruction âplease return the output in the JSON Schema providedâ it caused the errors to mostly go away. However it still happens from time to time.
Maybe I just need to more explicit in my instruction telling it to return JSON in the requested schema?
This is the format the newest gpt-4o has injected at the end of the first system message to inform of its expected response. You will note that it is not exactly the schema that is sent as API parameter.
You can try out the understanding and compliance on models not specifically trained for it.
The schema itself is the demonstraton format for my preset for an AI that makesâŚschemas.
# Response Formats
## response_reproducing_context
{"type":"object","properties":{"schemas":{"type":"array","items":{"type":"object","properties":{"schema_text":{"type":"string","description":"Any schemas requested, each in the form desired, such as the original text the AI received."},"schema_format":{"enum":["json","python","pydantic"],"type":"string","description":"The format of the schema, indicating how the schema is represented."},"destination_type":{"enum":["tool_function","response_format"],"type":"string","description":"Specifies whether the schema is intended for a tool function or a response format."}}}},"plain_text_response":{"type":"string","description":"A response to the user from the AI, providing a typical verbose response fulfilling the input."}}}
To really reinfore the JSON output by an algorithm actually affecting the tokens that can be produced, you would use json_schema, use "strict": true in the top level of the object, and set all the keys into the required field, placing ârequiredâ at each nesting level object. Only on âgpt-2024-08-06+â or mini. That then finally turns on structured outputs.
Increasing the repetition penalty can break up repetitive strings after a while of repeating and already being useless.
Iâve had a lot of success with getting structured output and response_format to work properly⌠but that doesnât mean that the content (within your structure) itself is going to be properly structured.
In fact, Iâve had issues with \n myself, come to think of itâthough nothing on the order of creating an infinite series.
Without jumping to fine-tuning, you can include âdescriptionâ fields and include more information about that field should, or should not, include. You should also include examples in your instructions of what the final output should look like.
{
"name": "structure_monster_info_response",
"description": "Structure the monster's info and output it as a JSON.",
"strict": true,
"schema": {
"type": "object",
"properties": {
"monster_unique_id": {
"type": "string",
"description": "Unique ID for the monster in our database, e.g., 'dndgpt_mon_0001'. This is a unique field."
},
"monster_parent": {
"type": "string",
"description": "The general category or type of monster, e.g., 'Dragon'. This is a unique field in our database."
},
"monster_name": {
"type": "string",
"description": "The name of the monster."
},