Cannot continue JSON response past max_tokens? Anyone figure this out?

This is using the GPT-4io model.

I have been trying for days now to get my openai sdk python script to be able to get a JSON response that exceeds max tokens.
e.g. if I get a length stop code, I’ve tried calling again placing the partial result in an assistant message and adding a user prompt “please continue the json”, that fails, on the submission of the new API call it starts the JSON from the beginning. I’ve tried placing the partial json in a new user message (please continue this JSON: {partial response}) it too starts over.
If I call multiple times, eventually it returns some random short json-esque snippet then I get a stop code.

Surely there is a trick here I’m missing. Is there no way to get it to continue the json response properly?

help me obi-wan kenobe’s, you’re my only hope!

1 Like

No, there is not- OpenAI’s models are capped at 4096 tokens. You can get higher with the Azure-hosted ones.

There’s no way to make a service bypass its limits- that would be a bug from OpenAI’s side, until they decide to support a higher max_tokens limit.

That being said, can you explain more about what you’re trying to do? There’s a lot of prompt engineering techniques such as symbol tuning that you can apply to maximize the utility of your completions.

The output tokens are 4096, the input is much higher. The goal here is to have it continue (in a fully new request, with a full new 4096 token window) a half completed json block that is now part of the larger input context. It should work, it just doesn’t.

Oh, now I understand what you’re saying. You don’t need to ask it to “continue the JSON” - remember that models are, at their core, just predicting the most likely next token in the sequence, so presenting it with unterminated JSON like so:

blah blah blah instructions

Answer in JSON:

  "prevKey1": "prevValue",

and then letting it fill in the rest from there will work. You can see a working example here.

If you actually terminate the JSON you provide, (I would guess) it’s less obvious for the model what should come next- “please continue the JSON” is not an instruction that makes a lot of sense to me, as a human, without all the context of this forum thread.

Unfortunately, I don’t know how well this technique will work with JSON mode.

(You could also try gpt-4-32k, which it looks like is still available- I didn’t suggest it because I thought it had been taken down.)