JSON Mode with GPT-4 turbo stops after 1050 token

Yes, that is the logical conclusion from little trickles of evidence. A different model at least as far as its fine-tuning on generating output.

The same is seen when you use gpt-3.5-turbo with or without function included in the API call - the AI has no understanding of a function definition when it is injected into the base model, but the emulated definition works fine when also including a different function to trigger the right model.

1 Like

That is a very cool find! Do you have these findings somewhere or could you show them here?

Here’s an idea of what’s going on internally:

The language the AI emits can’t be emulated as AI input, as you can’t wedge yourself into that part of the ChatML token container. “Exploit” would be just calling a function, which the AI already does. The AI essentially emits a to=function.function_name, a message that is addressed to a “recipient”, which recognizes and accepts the AI output (json typical unless you make your own uncharted rules) and gives you the function_call AI language you receive by API.

I assume it’s something like using a JSON grammar during token sampling, to discard invalid tokens from the vocab and only sample from valid ones.

1 Like

I’m under the same impression. It’s why I believe it can easily lead to “whitespace spam” if there’s no suitable tokens, and why instructions are also needed to push the model towards writing in a JSON format.

1 Like

Yes, totally, but the reason I started this thread seems like a different one, because completion often stops mid-sentence in a JSON-String value and always after 1050 tokens. So, mid-sentence, nearly every token is valid JSON.

I just had an idea. I’m using topP = temperature = 0. From my understanding, this reduces the space of possible next tokens, to the single most likely one. That may not play well with JSON-Mode, as I’m leaving the model with just a single option to chose from, which increases the likelyhood of whitespace loops.

Whatever, it would be great if openAI could be a litte more open about how their tech works :wink: Then we could figure this out way more easily.

1 Like

I’m experiencing the same bug. I’ve narrowed the bug to the fact that on the 1024th generated token, GPT-4 will begin to generate empty space characters or a malformed JSON.

I’ve tried spamming more “Respond in JSON” in the system prompt, as well as modifying the topP and temperature values but to really no avail (or maybe I haven’t tried hard enough). Any recommendations if you were able to fix it since?

2 Likes