So apparently OpenAI is testing the integration of o1-capabilities into GPT-4o . Many will know that from time to time, GPT-4o will provide two answers, one of which can then be chosen by the user as the preferred one. This serves to improve answers in the long run. Interestingly, today I encountered a very first; In my GPT-4o chat, one of the answer-options was generated using the o1-capabilities.
Welcome to the forum!
Thanks for sharing.
Is o1-capabilities
something that you saw on the screen or was it something you conjectured? I ask because o1-<xyz>
is a naming scheme and that can easily confuse many if you conjectured it.
You could say it is something I conjectured based on observations that leave no other explanation that would make sense. Let me elaborate:
1.) The answer generation took unusually long, which I initially blamed on a possible connection issue.
2.) However, when I looked at the generated answers, one of them had a header that said: Thought for 19 seconds. I expanded the header, and saw the thought-process it used, much like o1 does. I never encountered this, and I use GPT-4o extensively.
I did add a screenshot of it in my original post.
I would love to share the link to the chat, unfortunately I confronted it with the fact that it used o1-reasoning and posted a screenshot in chat - sharing of chats that have images in them is apparently not supported as of now.
Edit: spelling.
Yes, I am aware of the sharing-feature. The chat was in GPT-4o, not o1. When I try to share it, it tells me sharing of chats with images is not supported.
Thanks.
My bad.
I realized after re-reading your post I read that part wrong and deleted my post.
Yes, I have had the same experience. In my case, it was with proofreading short texts. I always preferred the o1-style option that only corrected mistakes, as it produced more straightforward results without additional commentary. It follows instructions to simply correct errors rather than refine the text and it doesn’t add any quotation marks.
Interesting! Did this also happen recently, or is it something they tested for a longer time now, that eluded me for some reason?
Apparently they try to gather data on how responses between 4o and o1 compare and are received by users, possibly to improve responses of future models.
Edit: Or to train the models to decide when to use reflective reasoning, and in what cases a “normal” answer might be sufficient, which would allow for a more tailored, efficient response generation?
I too have seen more than my share of these comparisons.
This has been going on with ChatGPT for as long as I can remember. I started using ChatGPT about two months after it debuted and it seemed much more prevalent back then. I even remember a time when it was the norm rather than the expectation.
The technology is RLHF ( Reinforcement learning from human feedback) for those interested.
It’s less about the comparisons themselves that stands out though, and more about the fact that they provide 2 answers in a GPT-4o chat, one of which was generated with GPT-4o, the other with o1. That’s the novelty I noticed, not the providing of two answers to choose from.
I must have first seen it about a month ago. However, with ChatGPT, there’s a lot more going on. For example, this data could be used to fine-tune the 4o model’s response style, rather than suggesting that o1 will replace 4o anytime soon. I think that’s also what Eric is hinting at.