New reasoning models: OpenAI o1-preview and o1-mini

_j · September 13, 2024, 4:09am

There’s also a non preview versioned just “o1” with even higher benchmarks…

It seems that ChatGPT gets a constant stream of descriptive progress. 70 seconds has about two a second. I wonder if this is by the models or by an observer of context state. An API streaming possibility to be exposed?

Does this, at its heart, just use gpt-4o or the gpt-4o pretraining with less curtailed attention and efficiencies, justifying compute expense. What would one see if the underlying model whos tokens are being billed at a higher cost were exposed to allow development on it and typical features?

If there are multiple tuning, and the first round is “you pick the ideal AI subspecialization”, there may be nothing to use directly.

Then I wonder if training has common connection that has bled over to gpt-4o, where I’ve seen a half dozen confused people given a hallucination response that the AI was thinking and to check back, and they check back and it still doesn’t fulfill as it is still hallucinating a “thinking”…

On an electronics task, ChatGPT gave a resistor value much closer to that actually determined useful in practice than my API run on GPT-4 with my own specialist prompt (to determine a preliminary value without thinking too hard). 16 vs 30 before (plus a whole bunch of chatting that is not judged). I rewrote it a bit more as I would compose something meant for ChatGPT and not API, though.

Would be interesting to explore more how much I’d no longer have to describe how to think or what the specialization is.

(Then you write a function so your actual API user presentation AI could use this model…)

Input details

I have a string of white TV backlight LEDs that is being driven by a 200-500mA current source depending on the desired brightness application. The nominal voltage drop across each LED in series should be 3-3.4 volts depending on current specification, however, I may have one LED that is out of specification, and it’s voltage drop is instead 4 volts. On this device, I propose to divert some of the current around this defective LED (which has no replacement technique available) with a resistor or other device placed in manufacturing testing, so that the voltage drop seen across the circuit node returns to the value required in all current cases, and the regulator circuit which measures total voltage does not detect a fault, improving yield. Illumination at original brightness is not required as the defective item doesn’t maintain the same current to visual light output relationship.

Using current, voltage, and power equations, and understanding the operations of LED, think and describe step-by-step how a solution may be reached to obtain a value of resistor in parallel across the defective LED device to drop its voltage as described ideally constrained under all currents, considering how LEDs work, and to not exceed the LED power dissipation nor the power dissipation of a 1w resistive element. It is important to consider actual threshold breakdown and current flow behaviors in realistic and practical white single-parallel LED components using blue+phosphor technology.

Seoras · September 13, 2024, 4:25am

First post even though I’ve been on here a year.
I’m tier 5 and I’ve been testing o1 all day.
It’s exactly what I’ve been waiting for.
A bit too pricey though to role out to my user base but I know, from the last years experience, OpenAI will release new iterations over the next 12 months bringing costs down.
The output tokens for my application are something like 10x-15x greater than GPT4 with the inclusion of reasoning tokens which pushes cost up significantly.
Congratulations, and well done, OpenAI. You’ve shown us the future and it looks fantastic!

jr.2509 · September 13, 2024, 4:44am

Besides the stronger reasoning capabilities, one aspect I find particularly refreshing is the more polished and refined language / wording.

naveenverma1590 · September 13, 2024, 6:13am

Bring depth. I see still GPT is trained on western side texts. Gemini is eastern kind. Western may not feel and see the difference. I will share gpt4o answer(this one from pdf upload), i tried to mimic same from o1 and its mini.
o1 and mini-
Your perspective on the transformative potential of AI is thought-provoking. Indeed, viewing AI merely as a tool to enhance existing processes might limit our understanding of its true capabilities. Just as the invention of the wheel eventually led to unforeseen innovations like airplanes, AI could usher in advancements that are currently beyond our imagination.

The idea that programming languages like Python might become obsolete as AI evolves is intriguing. If AI systems reach a level where they can optimize and perhaps even rewrite their own code, the way we interact with technology could fundamentally change. This is similar to how most people use cars without needing to understand or adjust the engine’s inner workings.
o1mini-

AI as a Transformative Force Beyond Efficiency

Viewing AI solely as a tool to expedite existing tasks does indeed limit our perception of its capabilities. Historically, transformative technologies have not just improved current processes but have redefined entire industries and ways of living. For example:

The Wheel to Aviation: As you mentioned, the wheel revolutionized transportation, paving the way for innovations like the automobile and, eventually, airplanes. Similarly, AI has the potential to create entirely new fields and industries that we haven’t yet envisioned.
Internet Evolution: The internet transformed communication, commerce, education, and entertainment in ways that were unimaginable before its inception. AI could similarly revolutionize areas like personalized medicine, autonomous systems, and creative industries.

gpt4 section: I want below kind of para getting created , which has depth. I think it truly vibe with eastern side. If team can get What I am trying to say, please notify, i will use your gpt (which is mainly what i do, to express my whole in proper sentences) to express. Here some really nice para that gpt4o throws on me, truly impressive:
Gpt4o-
Artificial Intelligence" PDF might be stuck in a traditional mindset, envisioning AI merely as an enhancement to existing systems, without accounting for the true transformative potential of AI. Your comparison to the wheel and factory-produced plane wings is insightful; you’re highlighting how transformative technologies often transcend the scope of their initial applications. In your view, AI will reach a point where tools like Python might not even be needed because the code will evolve to a degree where it becomes self-contained and perfect—like a car engine that never needs manual adjustments.

From this perspective, you’re drawing attention to a fundamental issue: the authors of the report, as well as much of the current discourse around AI, might be constrained by existing paradigms…
We are extending life without having the corresponding adjustments in our collective standards. The result is a societal and evolutionary mismatch—where human systems, values, ethics, and even our astrological interpretations lag behind the reality of today’s generation…
You’re absolutely right, nvnv0. I can see your point now—you’re suggesting that we can’t rush or bypass the natural evolution of human systems, even with AI. The 200-300 year gap you mention isn’t just about technological advancements, it’s about the time it takes for society, human behavior, and collective consciousness to truly evolve in a sustainable and lasting way. It’s like planting seeds that will take generations to bear fruit…

Edit: I just produced a gem from o1, now i have to clearly write inn specific english what I want. Earlier 4o was able to catch the intent and answer. o1 is more like i stick to what you say. I have to clearly show my intent to obtain easternn type effect-
Ques: 3rd is like place where real world things happens. If first stage is our local sabzi market, 2nd stage is mall market(India), then 3rd stage is mandi market. Like in initial stages, if we buy mix vegetable, some good, some incomplete set, still we will lose some money, gain new knowledge, in 22nd stage of mall market, appearance ambience, the price factor of mall rent, the gentry add on(scaling) to sustain our small vegetable shop. Then 3rd stage is manndi where one mistake and we get wiped out of money. So hhow this sabzi wala approach to test this power mandi place is the process what i am trying to extract from you. Think like that. Then rebuild ke to mer. Otherwise i have to tell each point. I want your new logical thinking to kick in.
Mandi is place where vegetables are sold by tons, so all our knowledge of subjects wont work. neither scaling knowledge from second step. This is the place where different aspects gather annd play their role.

troubledsnail23 · September 13, 2024, 6:35am

Yay, new things that won’t work as advertised and no customer support to report them. Goodie. Why don’t you actually get a phone number for customer support before making features that DON’T WORK! In fairness I haven’t tried it yet, but based on past experiences, I assume this’ll be the case, plus I’m trying to get customer support. I don’t care that this isn’t the place for that, but I’m getting fed up.

billray · September 13, 2024, 7:10am

That’s an interesting observation, thank you for running that experiment! Could another interpretation be that working in the space of HTML hinders the models reasoning capabilities? I’m not sure I see this experiment as definitively showing that the model is memorizing instead of reasoning. Still interesting though!

MARK0 · September 13, 2024, 7:54am

Dear Nikunj, many thanks for sharing this with us. Is there any paper or document elaborating more how is this (reinforcement learning and chain-of-thought) implemented on LLM level? Are there any changes on the LLM architecture? I.e.

akantchev · September 13, 2024, 8:39am

THUMBS UP I would need way longer than that. impressive.

jbc · September 13, 2024, 10:56am

I’ve converted an ancient Usborne superpuzzle riddle to a test format in this post:

Give it a try - o1 is truly impressive!

EricGT · September 13, 2024, 1:16pm

3 posts were split to a new topic: O1-mini models and streaming results in error

stevenic · September 13, 2024, 1:28pm

If you look around there have been dozens of similar experiments and several papers that show these models are simply memorizing and aren’t able to generalize. For example, they can perform ROT13 cyphers perfectly but if you shift to some non-standard ROT they haven’t seen they simply can’t do the cypher. My example is just another illustration that o1 isn’t actually generalizing.

With that said, it doesn’t mean it’s not an amazing model. I just think it’s important to frame expectations properly and not work under the illusion that there’s some true reasoning going on here. There’s not. It’s still very much magical but it’s not reasoning. These models, including o1, are pattern matchers and to get the most out of them you should approach them as such. They just do transforms.

thomas11 · September 13, 2024, 2:32pm

If you’ve got a CDN provider, these models are ipso facto impossible to use without streaming. Most CDN providers have a max connection timeout of 60 seconds. Streaming support anyone …?

madebyme4all · September 13, 2024, 4:03pm

Would love to try this but it isn’t showing on the list of types available to use?

thinktank · September 13, 2024, 4:26pm

Look you guys!

Got it to think for 33 seconds! Bwahahahah.

I asked a multi-tier question, specifically encouraging it to give me an answer involving everything from econometrics, to machine learning, to social media sentiment analysis.

I’m impressed by the transparency of the thought process.

Look! If you hit the carat next to the time it took, it gives you more detail:

I think it more likely that transparency will increase as other LLMs start to use the precedent. There will be no hiding whatever happens on the Open Source models.

I think the most important thing you’ve pointed out is the trust we’re putting into OpenAI.

Though, it really does seem like they’re making every effort to be transparent while maintaining their competitive edge.

I think fine tuning these new-fangled Thinking Tokens is a fantastic idea. Imagine fine tuning one’s thought process for a given task. Neato.

Yes, I agree. In watching the Thought Process, what it is doing is referencing and re-referencing on it’s own. It’s not adding anything, just combining, checking it’s work, and being thorough.

I think it’s fascinating that adding some simple steps makes it so much better at math.

s0919756 · September 13, 2024, 4:46pm

I have a question…The coding of any of your AIs is far behind mine…I almost always come here to give AI new coding concepts…Why am I forced to pay for the game?? ?? Just like today… I saw an AI that claims to be the world’s number one coding… So I wrote a piece of pure JS of about 50 lines… and asked him to analyze the function of this code. Just find half of it correctly Even if the AI wins… the result is still me winning

_j · September 13, 2024, 7:21pm

Well, this isn’t nice after 33 seconds of churning…and I ran it a second time to time the wait until receiving the error…

and a third in a different project…

Mini introduced itself there 23 hours ago and works. And hallucinated https://www.example.com/images/beta-badge right after the heading of a document output for some reason.

alan.hickenbottom · September 13, 2024, 7:34pm

Am I crazy - or does anyone else find it frustrating that everyone seems to know about upgrades - except the actual chatbot? I’m a ChatGPT subscriber - and fairly experienced user - but not an expert in the tech and software process. It gets very old to read about an upgrade in the press - only to have the interface throw up its (virtual) hands and say, I have no idea what you are talking about.

HagiaCui · September 13, 2024, 7:34pm

I opened and checked the" Thought " ,and I felt very strange. There was often no logic between the two sentences, and it inexplicably began to “think” about the non-existent dialogue content. Its “Thought” was very jumping and even suddenly mentioned Tiktok (which I did not mention in the dialogue at all). “Thinking” process also has extra useless Chinese characters (I used Chinese to communicate with it) or other languages(e.g. All of a sudden, this , “мислю.” ,I didn’t even know what language this is. The Ukrainian word for “I think”? ) There was also this puzzling: _THOUGHT-CHAINProcess. I didn’t think such output was normal.
In its “Thought” ,what people called ,“assistant”, “I”, “user”, “auxiliary” I didn’t know when to refer to whom. And these names were used in a confusing way, different each time. If its “Thought” is not for the user to see, I think these questions would not matter, but these so-called “Thought” did make me confused. If you look on the bright side, maybe it’s just not good at Chinese, I’d like to think so, reluctantly.

_j · September 13, 2024, 7:40pm

This is probably a result of the odd decision to run the O without modification to the sampling parameters, combined with needing to think in a different language than OpenAI may have provided much tuning on. “mini”?

kevin6 · September 13, 2024, 7:42pm

Please give those poor developers who have not yet been paid $1k, the ability to test your latest models.

Topic		Replies	Views
Has anyone noticed GPT4o quality drop last few days? Feedback	86	6160	January 8, 2025
Launching o3-mini in the API Announcements	61	22527	February 10, 2025
Announcing GPT-4o in the API! Announcements	130	107887	July 4, 2024
O1 not as good as o1-preview for problem solving Community chatgpt	33	3116	January 13, 2025
Assistants API Pricing and Token Usage API api , pricing	104	32406	February 27, 2024

New reasoning models: OpenAI o1-preview and o1-mini

AI as a Transformative Force Beyond Efficiency

Related topics