Can the API continue generation exactly after the last assistant message (assistant‑last continuation)?

I want to double‑check expected behavior regarding assistant‑last continuation in API call: sending a new request where the final item in messages is an assistant message, and asking the model to continue right after that text rather than restarting an answer to the earlier user turn.

I am doing token‑level steering for research. The concrete case is: when the model outputs the token sequence for Chat, I want to insert "GPT" immediately, then let the model keep going as if it had produced ChatGPT itself. I can detect Chat during streaming, stop the first call, and start a second call whose last message is the assistant prefix with my injected "GPT" appended. I am not trying to remove or bypass the chat template; I only want to condition the next tokens on the assistant‑last prefix.

Is assistant‑last continuation an officially supported and stable pattern for Chat Completions and/or Responses? In other words, if my new request ends with an assistant message that contains all prior assistant text (plus my small injected string), should the next token be sampled as a continuation of that text, rather than a restart from the last user turn?

Another way to do it is to have a user message with “Continue” as content. However, this is not consistent and the response includes generation from the very beginning, and not ending with what I have in the previous assistant message.

OpenAI will not allow continuation on an Assistant message (like Anthropic does).

You always get a new internal “prompt” for an assistant to start a message.

Depending on the model and the context, it might continue text, or it might repeat the same thing again. Definitely more “happenstance” than stable. GPT-4, the original, will usually give a continuation.

Thanks! I don’t know Anthropic’s doc explicitly says they support this. link

However, what I want is a constraint decoding, generating certain tokens in the middle of a generation, which requires both logit bias and continuing generation. It seems Claude does not have the logit_bias parameters in their API….