JSON Mode with GPT-4 turbo stops after 1050 token

_j · December 8, 2023, 1:54am

Yes, that is the logical conclusion from little trickles of evidence. A different model at least as far as its fine-tuning on generating output.

The same is seen when you use gpt-3.5-turbo with or without function included in the API call - the AI has no understanding of a function definition when it is injected into the base model, but the emulated definition works fine when also including a different function to trigger the right model.

RonaldGRuckus · December 8, 2023, 1:59am

That is a very cool find! Do you have these findings somewhere or could you show them here?

_j · December 8, 2023, 2:14am

Here’s an idea of what’s going on internally:

The language the AI emits can’t be emulated as AI input, as you can’t wedge yourself into that part of the ChatML token container. “Exploit” would be just calling a function, which the AI already does. The AI essentially emits a to=function.function_name, a message that is addressed to a “recipient”, which recognizes and accepts the AI output (json typical unless you make your own uncharted rules) and gives you the function_call AI language you receive by API.

paintedpike · December 8, 2023, 10:42pm

I assume it’s something like using a JSON grammar during token sampling, to discard invalid tokens from the vocab and only sample from valid ones.

RonaldGRuckus · December 9, 2023, 12:27am

I’m under the same impression. It’s why I believe it can easily lead to “whitespace spam” if there’s no suitable tokens, and why instructions are also needed to push the model towards writing in a JSON format.

paintedpike · December 9, 2023, 11:09am

Yes, totally, but the reason I started this thread seems like a different one, because completion often stops mid-sentence in a JSON-String value and always after 1050 tokens. So, mid-sentence, nearly every token is valid JSON.
…
I just had an idea. I’m using topP = temperature = 0. From my understanding, this reduces the space of possible next tokens, to the single most likely one. That may not play well with JSON-Mode, as I’m leaving the model with just a single option to chose from, which increases the likelyhood of whitespace loops.

Whatever, it would be great if openAI could be a litte more open about how their tech works Then we could figure this out way more easily.

khoinguyen · February 5, 2024, 8:03pm

I’m experiencing the same bug. I’ve narrowed the bug to the fact that on the 1024th generated token, GPT-4 will begin to generate empty space characters or a malformed JSON.

I’ve tried spamming more “Respond in JSON” in the system prompt, as well as modifying the topP and temperature values but to really no avail (or maybe I haven’t tried hard enough). Any recommendations if you were able to fix it since?

Topic		Replies	Views
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	19	7926	May 2, 2024
Azure GPT-4-Turbo JSON mode response generation breaks after 1024 tokens Bugs gpt-4 , chatgpt , api	4	3225	March 20, 2024
Simple Request exeeds Completion Token Limit of 4096 Token; GPT-35-turbo API gpt-35-turbo , api	3	1093	November 22, 2023
Response has valid json but it's nested in broken json Bugs	16	3253	September 9, 2024
Is JSON Mode supposed to result in a higher prompt token count? API	2	1197	December 1, 2023

JSON Mode with GPT-4 turbo stops after 1050 token

Related topics