Cannot continue JSON response past max_tokens? Anyone figure this out?

afitzhugh · June 7, 2024, 6:28am

This is using the GPT-4io model.

I have been trying for days now to get my openai sdk python script to be able to get a JSON response that exceeds max tokens.
e.g. if I get a length stop code, I’ve tried calling again placing the partial result in an assistant message and adding a user prompt “please continue the json”, that fails, on the submission of the new API call it starts the JSON from the beginning. I’ve tried placing the partial json in a new user message (please continue this JSON: {partial response}) it too starts over.
If I call multiple times, eventually it returns some random short json-esque snippet then I get a stop code.

Surely there is a trick here I’m missing. Is there no way to get it to continue the json response properly?

help me obi-wan kenobe’s, you’re my only hope!

sam-boundary · June 17, 2024, 5:07pm

No, there is not- OpenAI’s models are capped at 4096 tokens. You can get higher with the Azure-hosted ones.

There’s no way to make a service bypass its limits- that would be a bug from OpenAI’s side, until they decide to support a higher max_tokens limit.

That being said, can you explain more about what you’re trying to do? There’s a lot of prompt engineering techniques such as symbol tuning that you can apply to maximize the utility of your completions.

afitzhugh · June 18, 2024, 4:30pm

The output tokens are 4096, the input is much higher. The goal here is to have it continue (in a fully new request, with a full new 4096 token window) a half completed json block that is now part of the larger input context. It should work, it just doesn’t.

sam-boundary · June 20, 2024, 4:57pm

Oh, now I understand what you’re saying. You don’t need to ask it to “continue the JSON” - remember that models are, at their core, just predicting the most likely next token in the sequence, so presenting it with unterminated JSON like so:

blah blah blah instructions

Answer in JSON:

{
  "prevKey1": "prevValue",

and then letting it fill in the rest from there will work. You can see a working example here.

If you actually terminate the JSON you provide, (I would guess) it’s less obvious for the model what should come next- “please continue the JSON” is not an instruction that makes a lot of sense to me, as a human, without all the context of this forum thread.

Unfortunately, I don’t know how well this technique will work with JSON mode.

(You could also try gpt-4-32k, which it looks like is still available- I didn’t suggest it because I thought it had been taken down.)

Topic		Replies	Views
Continuing content after output token limit? API	3	1170	May 23, 2024
How to complete Long API responses? API gpt-35-turbo , chatgpt	6	4265	December 19, 2023
Tips for handling finish_reason: length with JSON API	5	2744	August 24, 2024
When max_token, how to continue the response context? API chatgpt , token	5	2509	December 18, 2023
Large JSON Responses from Assistant API are truncated API json , assistants-api	5	909	June 20, 2024

Cannot continue JSON response past max_tokens? Anyone figure this out?

Related topics