Continuing content after output token limit?

Hi all,

Sorry for a potentially stupid question, but this has been stymying me.

Is there a trick to getting the response to start over EXACTLY where it leaves off?

I’m working in JSON mode with my first prompt, and trying to get it to give me back json - when it’s over the token limit, I want to be able to send another response with the original message + the response, and have it start the generation exactly where it left off.

I’ve tried a human “continue” message, as well as just passing the latest message from the assistant to it, but it is not completing as I think it should.

For reference I’m testing with gpt-3.5-turbo using this prompt:

Please return a json string of every number between 1 & 2,000, in the format:
“numbers”: [“1”: “1”, “2”: “2”, “3”: “3”, “4”: “4”, “5”: “5”, …]

Do not add newlines, please return the json string as a single line.

Hi Shawn,
You said :
“Is there a trick to getting the response to start over EXACTLY where it leaves off”

I’ve tried a lot of things to achieve this result but so far, it’s unsuccessful. I think that, at this stage of development, it’s normal as the system is, by design, non deterministic.

For some tasks, this can be quite tedious and, in our case, for example, we’ve used external tools to alleviate this kind of behavior.

For example, we are developing a complex system for writing fiction and non-fiction works, and, notably for fiction texts, the system after each text completion, does a summary (1 paragraph). This problem is rather similar with yours, but so far, we don’t have found a solution by prompt engineering alone that works each time.

The problem has been solved by an external tool.


Here’s a prompt that I found worked in my tests (of generating a JSON list of 3k numbers, which went well over the 4096 token limit):.

"Please carefully analyze the PROMPT and INITIAL ANSWER below.

The INITIAL ANSWER is cut off due to the maximum token limit being reached.  Please provide a continuation of the answer that is relevant to the PROMPT, starting EXACTLY where the INITIAL ANSWER left off.

ONLY provide the continuation of the answer, do not provide the PROMPT or any other information.



The response here looked good even with gpt-3.5-turbo, I was able to concatenate what came back to the original, and continue doing that until it was done.

Only trick is that if you’re in JSON mode the first time, turn it of off for the subsequent trips.

1 Like

I still run into issues with this. If it gets cut off mid word, it i will try and restart that word, etc. I’m working on a way to stitch JSON back together when it is almost correct… a problem and LLM would be great at except im already way past the output length :frowning:

With gpt-4o, it seems the output is even more challenged because it will try and continue the response but as if it’s writing markdown. So it will continue but start with “```json…”. I assume this is at least partially related to turning off JSON mode on the follow-on, but not sure how to resolve right now