Is it possible to stream the chain of thought happening during o1 API calls? Calls to this model can sometimes take a while and it would be a great user experience if I could show what was happening while the end user waits the extra time for a final response.
I would like to create an experience similar to the “thinking” loader in the chatGPT UI.
My call to the o1 model is using a structured output, so I would still like that to be the final response, but being able to show the chain of thought during the loading state would be amazing!
There is no such feature announced and no streaming for either o1-preview or newly announced o1 (plain) currently available.
Streaming is “coming sometime” to o1, and the delay might be in also the implementation for offering such progress events besides ensuring that we never get anything leaked from the internal reasoning generation.
Streaming and “thinking” remains something fully realized in ChatGPT, broadly available to anyone that wants to pay $200 rent for a non-nerfed o1 - and unavailable to API developers. o1 itself is unavailable to anyone except a slim selection of tier-5 API users.
This forum has Feedback - OpenAI Developer Forum - but is basically idea banter with other users, thus not a complete “paste bot text ideas and get ignored” dumping ground as ChatGPT “feature requests”.
Streaming, “thinking”, and keep-alive progress (against network timeouts) is somewhat obvious as a request, but doesn’t have a straightforward chat completions method, as that API endpoint is not built for out-of-band metadata. You already have to ask for usage data as an additional chunk by sending an API parameter for it. A new delta chunk field for events could break everybody’s code.