I’m using GPT-4o with the json response_format and occasionally running into large responses that exceed the 4k max. I’m using pagination already with a “more” flag, but the JSON response ends abruptly, the finish_reason is set to “length”, but I’m unable to have it pick up where it left off.
I currently detect the finish_reason, append the assistant’s response + a new user message of something like “JSON cut off due to token length. Continue exactly where you left off”, but it’s never able to. It either starts a new JSON object, or picks up where it left off but fails to close the original object properly.
I’ve noticed the ChatGPT UI seems to handle this with a “Continue generating” button, and it works perfectly.
It’s quite probable that OpenAI has access to things we developers don’t, such as pre-empting a response. (anthropic allows that on the api, for example)
Personally, I’m parsing JSON responses in real time, so it doesn’t really bother me if I get an incomplete one - I could just back off and ask for the next object. But generally I try to use LLMs to reduce and compact information, not to generate.
Can I ask what you’re trying to achieve? If you’re using the LLM to copy and paste a lot of content, you could instead consider instructing the model to only give you the first and last couple of words in a chunk (or a chunk descriptor (e.g. line numbers, titles, etc)), and then augment the generation programmatically.
This is something that a non-terrible AI model can do successfully. Give a message like “The last output was incomplete. Resume your response, repeating the last incomplete line.”
Every new API call has message containers and an unseen prompt for where the assistant should write. Therefore, you cannot continue where you left off seamlessly by providing all the input up to then.
Anthropic allows you to complete assistants messages, which allows you to continue. Perhaps OpenAI is ashamed at what their models would produce when led down a path, or wants to ensure developers don’t get the same feature set as their own chatbot.
edit: if the context length allows, you could switch to that good AI model (GPT-4) to receive the completion.
I have the same issue. In my case, the JSON is complex and not just a list of objects, so recovery is not really feasible. Weirdly, continuation works fine in some circumstances but not in others. E.g., when requesting response format “text” it seems to work better. Seems pretty inconsistent to me… would be nice if it was better documented. Actually, would be nice if - as brin said - it just worked the same as in the UI.