Tips for handling finish_reason: length with JSON

brin · June 7, 2024, 2:04pm

I’m using GPT-4o with the json response_format and occasionally running into large responses that exceed the 4k max. I’m using pagination already with a “more” flag, but the JSON response ends abruptly, the finish_reason is set to “length”, but I’m unable to have it pick up where it left off.

I currently detect the finish_reason, append the assistant’s response + a new user message of something like “JSON cut off due to token length. Continue exactly where you left off”, but it’s never able to. It either starts a new JSON object, or picks up where it left off but fails to close the original object properly.

I’ve noticed the ChatGPT UI seems to handle this with a “Continue generating” button, and it works perfectly.

Diet · June 7, 2024, 2:13pm

It’s quite probable that OpenAI has access to things we developers don’t, such as pre-empting a response. (anthropic allows that on the api, for example)

Personally, I’m parsing JSON responses in real time, so it doesn’t really bother me if I get an incomplete one - I could just back off and ask for the next object. But generally I try to use LLMs to reduce and compact information, not to generate.

Can I ask what you’re trying to achieve? If you’re using the LLM to copy and paste a lot of content, you could instead consider instructing the model to only give you the first and last couple of words in a chunk (or a chunk descriptor (e.g. line numbers, titles, etc)), and then augment the generation programmatically.

_j · June 7, 2024, 2:23pm

This is something that a non-terrible AI model can do successfully. Give a message like “The last output was incomplete. Resume your response, repeating the last incomplete line.”

Every new API call has message containers and an unseen prompt for where the assistant should write. Therefore, you cannot continue where you left off seamlessly by providing all the input up to then.

Anthropic allows you to complete assistants messages, which allows you to continue. Perhaps OpenAI is ashamed at what their models would produce when led down a path, or wants to ensure developers don’t get the same feature set as their own chatbot.

edit: if the context length allows, you could switch to that good AI model (GPT-4) to receive the completion.

john.brush · June 18, 2024, 12:24pm

I have the same issue. In my case, the JSON is complex and not just a list of objects, so recovery is not really feasible. Weirdly, continuation works fine in some circumstances but not in others. E.g., when requesting response format “text” it seems to work better. Seems pretty inconsistent to me… would be nice if it was better documented. Actually, would be nice if - as brin said - it just worked the same as in the UI.

jack25 · July 3, 2024, 9:48pm

It’s odd that openAI can’t fix the JSON string. Even applying a package from github would be an improvement.

hello121 · August 24, 2024, 9:39am

Hello,
Same problem here

Topic		Replies	Views
Continuing content after output token limit? API	3	2021	May 23, 2024
How to complete Long API responses? API gpt-35-turbo , chatgpt	6	4887	December 19, 2023
Cannot continue JSON response past max_tokens? Anyone figure this out? API json , json-mode , gpt-4o	3	1471	June 20, 2024
How to continue generation through api implementation API api	5	6327	June 30, 2023
Large JSON Responses from Assistant API are truncated API json , assistants-api	5	1464	June 20, 2024

Tips for handling finish_reason: length with JSON

Related topics