I want to double‑check expected behavior regarding assistant‑last continuation in API call: sending a new request where the final item in messages is an assistant message, and asking the model to continue right after that text rather than restarting an answer to the earlier user turn.
I am doing token‑level steering for research. The concrete case is: when the model outputs the token sequence for Chat, I want to insert "GPT" immediately, then let the model keep going as if it had produced ChatGPT itself. I can detect Chat during streaming, stop the first call, and start a second call whose last message is the assistant prefix with my injected "GPT" appended. I am not trying to remove or bypass the chat template; I only want to condition the next tokens on the assistant‑last prefix.
Is assistant‑last continuation an officially supported and stable pattern for Chat Completions and/or Responses? In other words, if my new request ends with an assistant message that contains all prior assistant text (plus my small injected string), should the next token be sampled as a continuation of that text, rather than a restart from the last user turn?
Another way to do it is to have a user message with “Continue” as content. However, this is not consistent and the response includes generation from the very beginning, and not ending with what I have in the previous assistant message.