I’m testing JSON-Mode in our app, and it reliably stops generating after 1050 generated token. Also, the before stopping, the model generates many whitespaces at the end
As I read on the forum, the output is capped at 4095 token. Why is it stopping after 1050 token in JSON Mode?
Also, are there any timelines on when the new turbo models will be production ready. We have a use case where the larger input context is extremely important!
Yes, the prompt includes a description of the JSON “Schema” (not a valid JSON-Schema though) I want the model to generate. All completions contain the JSON in the format I want. However, the models just stops generating a proper response after about ~1k token (sometimes mid-sentence, as you can see in the screenshot), adds those whitespaces and completely stops after 1050 token.
The prompt is already tested by hundreds of users on a daily basis. So this error happens regularly and always ends after 1050 token.
Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly “stuck” request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
OpenAI should have released a classifier model to prevent people using JSON mode before reading the instructions
Even if you are asking for JSON you need to make it want to write JSON
Here you are. It may make sense to try the model without JSON mode first in development. I believe it’s an external validator. If the model doesn’t want to write JSON it results in an infinite loop.
It definitely not an external validator as the answers to the exact same questions with and without json are extremely different in quality (following instructions) and size.
You will understand what I mean if you check out the conversation I posted the link to above
Overall it seems like json mode takes too much of the model’s attention, that very little attention is left to the rest of the instructions, which it starts to follow pretty poorly.
You always reply in the json format with the fields:
- ai_message: your message to the Human
- status: assessment status
- status_comment (very detailed comment to the status, not to be shown to the Human)
It takes a lot of effort to make GPT-4 spam whitespace until it max tokens
Here is a direct link to response_format in the API documentation with that text.
For the GPT-4 vision model, OpenAI has set an unreasonably low max_tokens default value, almost to force you to use the specification. They may have done the same on the json model. You can include a max_tokens value of your own to avoid truncation. max_tokens is both a context length reservation for forming the output and the limit you will get.
(why forced short? To make you specify it of course. To cut their processing of AI gone loopy. The rate limiter can’t measure and count unknown outputs against you …)
The rest of the blurb you read is that json is not actually “guaranteed”, otherwise you wouldn’t get strings of nonsense. They put a check for the word “json” into the endpoint just to avoid the worst misapplication, but you should always specify what kind of JSON it is you wish to receive with lists of keys, examples, or schema for ideal results.
I got to play with the leaked version of the AI model outside of the chat endpoint.
Consider it like this: The AI can write functions. It has been trained on function-calling, and has a preference even without any language talking about functions. But it is not going to emit a properly formed “weather function” without that function specification done right.
The JSON model is just higher confidence in good output.
100%. I have an “Idea Generator” that has been writing an array of shallow objects for months, hasn’t failed once (90% of it is done by me though LOL). No JSON mode needed.
Agreed!
Are you saying that JSON mode uses a different model? Not disagreeing. Genuinely interested in this thought