What is the difference of o1?

What is the difference of o1 series models from plain transformers?

A transformer can generate a series of thoughts, separating them, say by “###”. How are o1 “forced” to do a different thing than a plain transformer would do? How is it forced to add the next thought, not just to add to a previous one, as a plain GPT would do?

1 Like

“How it works” would be a good start for that which OpenAI wants to portray and reveal.

https://openai.com/index/introducing-openai-o1-preview/

https://openai.com/index/learning-to-reason-with-llms/