Previously when using the Chat Completion API, to support multi-turn conversation, we would maintain the state of the conversation in our app and replay the ‘assistant’ messages.
With o1-preview, given that we don’t get access to the reasoning tokens, what is the best practice in maintaining the conversation state? Do we replay just the assistant message without the reasoning token? Would this reduce the accuracy of o1-preview, or lead to unnecessary tokens being generated in the second turn of the conversation as o1-preview might have to ‘rethink’ through the problem again?
You have to just accept that you can only pass the last final AI response. Unlike ChatGPT currently, the stateless chat completions AI will never see past reasoning context until you are also able to view it, or can link in a chain of past completion IDs as context, neither which seems likely.
The o1 models introduce reasoning tokens. The models use these reasoning tokens to “think”, breaking down their understanding of the prompt and considering multiple approaches to generating a response. After generating reasoning tokens, the model produces an answer as visible completion tokens, and discards the reasoning tokens from its context.
Here is an example of a multi-step conversation between a user and an assistant. Input and output tokens from each step are carried over, while reasoning tokens are discarded.
I don’t see a difference in how multi-turn conversation would work between o1-preview or gpt-4o. For example I save all the user and assistant messages in a database and I pass them all along in new requests. I would not really want to store or pass any reasoning tokens anyway even if we had to them, but regardless they are not needed for multi-turn conversations.