Can the API continue generation exactly after the last assistant message (assistant‑last continuation)?

Kevin_Dong · August 26, 2025, 10:20pm

I want to double‑check expected behavior regarding assistant‑last continuation in API call: sending a new request where the final item in messages is an assistant message, and asking the model to continue right after that text rather than restarting an answer to the earlier user turn.

I am doing token‑level steering for research. The concrete case is: when the model outputs the token sequence for Chat, I want to insert "GPT" immediately, then let the model keep going as if it had produced ChatGPT itself. I can detect Chat during streaming, stop the first call, and start a second call whose last message is the assistant prefix with my injected "GPT" appended. I am not trying to remove or bypass the chat template; I only want to condition the next tokens on the assistant‑last prefix.

Is assistant‑last continuation an officially supported and stable pattern for Chat Completions and/or Responses? In other words, if my new request ends with an assistant message that contains all prior assistant text (plus my small injected string), should the next token be sampled as a continuation of that text, rather than a restart from the last user turn?

Another way to do it is to have a user message with “Continue” as content. However, this is not consistent and the response includes generation from the very beginning, and not ending with what I have in the previous assistant message.

_j · August 27, 2025, 9:09am

OpenAI will not allow continuation on an Assistant message (like Anthropic does).

You always get a new internal “prompt” for an assistant to start a message.

Depending on the model and the context, it might continue text, or it might repeat the same thing again. Definitely more “happenstance” than stable. GPT-4, the original, will usually give a continuation.

Kevin_Dong · August 27, 2025, 5:08pm

Thanks! I don’t know Anthropic’s doc explicitly says they support this. link

However, what I want is a constraint decoding, generating certain tokens in the middle of a generation, which requires both logit bias and continuing generation. It seems Claude does not have the logit_bias parameters in their API….

Topic		Replies	Views
Continuation of model response with hardcoded previous session API chatgpt , api , assistants-api	0	103	September 2, 2025
How to Implement 'Continue' Function in Chatbots with Assistant API API gpt-4 , chatgpt , api , assistants-api	1	205	November 4, 2024
How to force to continue a truncated completion? API	2	5747	December 24, 2023
Restarting partially completed chat completion API calls API	12	1905	February 23, 2024
Continuing content after output token limit? API	4	2586	December 26, 2025

Can the API continue generation exactly after the last assistant message (assistant‑last continuation)?

Related topics