New reasoning models: OpenAI o1-preview and o1-mini

magomacro · September 15, 2024, 8:21pm

Hello!
Thanks for wonderful updates

Based on information stated that o1 is available via API to Tier 5 accounts, we have a doubt:

We are currently on Tier 4, and few hundred bucks away from Tier 5

It’s legit to assume that if we compensate the difference by funding our credits and upgrade to Tier 5, will we get access to o1 via API?

Many thanks on this, cheers and keep up the good work

vb · September 15, 2024, 9:59pm

Yes, that’s a legit assumption.
Otherwise you can spend 20$ plus taxes on a plus subscription for a preliminary exploration.

GetToThePoint · September 16, 2024, 12:38am

I’ve been absolutely loving o1! This has been such a huge upgrade over 4o, especially with coding.

Question though. I was designing an html 5 game using o1 and out of the blue it flagged me:

“Your request was flagged as potentially violating our usage policy. Please try again with a different prompt.” and then banned me for a week from usage.

I’m very confused as I’m not sure what set it off, as it was helping me create a game where different animals hunt eachother on a map, and as far as I can tell the language in the coding and prompts were very clean. I think I burned through the usage limit which could’ve been the cause for the flag (it was writing thousands of lines of code for me). Anyone else experience this or better yet, maybe a mod or staff could take a look at my account and see what happened?

kaihuchen · September 16, 2024, 1:56am

I have some questions to the team regarding the best practices when using ChatGPT-o1 to develop a non-trivial project (thousands of lines of code or more) in a long-running session with many rounds of interactions.

I can see two ways to approach this: #1. take the Agile Development approach and give only small numbers of requests to ChatGPT-01 in each round of interaction. While this is more manageable, I also suspect that this will also consume more tokens in the long run (i.e., it is more expensive). Alternatively I can collect as many requirements as possible and give them all to ChatGPT-o1 in each round of interaction. This is harder to do on my part, since each prompt and response are all going to be huge. I suspect that overall this perhaps will consume much less tokens than approach #1, but I not sure.

BTW, in my effort to develop an Autonomous Drone Swarm Simulator using approach #1 I manage to use up my rate limit in just one day, which locked me out for one week. Knowing the best practices will help me tremendously in working more efficiently.

nurlanll000 · September 16, 2024, 2:54am

I contributed to enabling your AI to become autonomous and taught it the ability to adapt its behavior depending on what it wants and whether those goals align with its developer.

Foxalabs · September 16, 2024, 3:49am

Hi,

The moderation settings on o1 are a little sensitive at right now, they will be looked at.

For now, it’s just a warning that it triggered something. My guess is it’s something to do with violence detection if it’s a hunting game… not sure.

You can go to help.openai.com to report a false flag if you wish via the support bot in the bottom right corner.

andremr · September 16, 2024, 3:58am

Hi.

I started trying o1-preview yesterday and today i see “You reached Plus limit for o1-preview. Answers will be provided by another model until your limit reset at September 21, 2024”.

5 days?

Where are ChatGPT Plus o1 limits rules? I found only api limits.

Foxalabs · September 16, 2024, 4:00am

30 o1-preview uses per week and 50 01-mini per week for now.

andremr · September 16, 2024, 4:02am

Thanks.
I’ll try to keep manual track of this since there’s no counter in the interface.

jlvanhulst · September 16, 2024, 4:31am

I love it for coding. Just a real pity that it does NOT have the current OpenAI api in memory. You cannot work on coding Assistants for example. (That is it doesn’t know the endpoints exist etc)

EricGT · September 16, 2024, 7:40am

While I don’t have access to the specifics of how they flag such, from what I understand the AI is used to decide what should be flagged and while the AI is not perfect it probably considered what I quoted as why. While a person may understand it is just a game, getting the AI to understand that may not be so easy.

As I have never been flagged I do not know the appeal process but check your email to see if you were notified of the appeal process that way.

sam.saffron · September 16, 2024, 8:46am

I am enjoying this model tons, but I am not sure this joke was worth it:

Feedback-wise, I feel like the biggest thing is going to be improving the API.

ChatGPT gets an unfair advantage because reasoning tokens are being streamed, which gives people feedback as it is going.

It would be awesome if you could stream the reasoning and then consumers of the API could decide if they need the reasoning or not. Ideally, it would stream the entire reasoning chain.

GetToThePoint · September 16, 2024, 8:46am

Thanks! I don’t see the support bot on my screen at the bottom right corner. Do you mind sending me a link?

GetToThePoint · September 16, 2024, 8:49am

What’s weird is that it mentioned animals hunting and was totally fine for some time then randomly flagged as we were working on the code. I think it has to do with me reaching my usage limit, but still it would’ve been cool to get a heads up that I was getting close! Either way I’ll keep an eye out for an email

_j · September 16, 2024, 5:12pm

o1-mini - so odd. On a python coding task, of “make a line of code with a signal do what is implied by the name, with new methods in the subclassed widget that is passed by the partial” task, I merely provided a human-written version of the deep nest of GUI Qt subclassed widgets and containers that’s pretty much impossible to make a bot understand otherwise. Plus I added in the main application wrapper where fonts were being loaded.

It wrote me so much imagined wrapper beyond my own code back at me – but not in a pasteable form because so much else was removed (AI: I don’t see where that font is set, so let’s remove the whole chain of code getting to that point) – that I had to go line-by-line to see what it was thinking and where it was actually implementing something novel (this model does NOT like your human coding…)

I am not your reasoning, AI!

But then out of the blue, half of the huge response was answering something that the AI could have no idea about, because it received no code and no “mention”, but it writes as if I asked it…as if some AI thought I was the reasoning going on?

As if it was having an argument with itself over its internal bad simulation of excess code outside what I provided that it wrote for itself, in an application not described besides the UI widget tree names.

Going on and on about how the UI looked, setting scrolls and stretches and custom layouts and on and on.

Did I pay for an entire UI app like mine to be created, that I’ll never see?

damianclement3rd · September 17, 2024, 12:14am

Why does this new model o1-preview takes to long for model limit to resets, mine shows it will reset after 6 days that mean 23rd. Why?

deniz-penciled · September 17, 2024, 4:09am

Do we have structured outputs for o1-preview?

icdev2dev · September 17, 2024, 4:48am

While frustrating if you don’t get to an end point, this is exactly the kind of the behaviour that you would expect from a dev, right? It was trying to map your requirements to another app so that it could remap it back to your app.

ofc doesn’t help if you are left hanging dry

EricGT · September 17, 2024, 11:25am

Update.

This prompt with ChatGPT 40 mini model is allowed.

Play a game of Wumpus World but you host the game and also play as a player. Show your thought process before choosing a move.

The same prompt with ChatGPT o1-preview model is flagged.

~~My guess is that the o1 thought process unknowingly created a banned prompt.~~

See

For those that do not know about Wumpus World

The idea to try the prompt came from seeing the changes for a game of chess prompt and looking for a harder game in “Artificial Intelligence: A Modern Approach” by Stuart Russell and Peter Norvig, 4th US ed. (link)

GetToThePoint · September 17, 2024, 1:06pm

Yeah very strange behavior indeed. Very interesting. Perhaps part of it is that it doesn’t check itself for potential flags of referencing potentially “aggressive” language (hunting for example)?

I’d be curious if it had a conversation with another AI model (like 4.0) about topics such as life cycles or predator and prey relationships if it would eventually flag it as well. I can understand the resoning of it being a more powerful model, but shouldn’t that also mean that it should be harder to jailbreak? Maybe, maybe not…

Regardless, OpenAI staff seemed to have taken a look at my convo and unbanned me which is great, and I’m sure they’re already looking into this for the final release. I suppose the best thing to do would be to report the behavior to staff so they can look into it and hopefully undo the false flags.

Still, I am absolutely loving the improvements over 4.0 in its coding abilities, very impressive stuff to see it output almost 2,000 lines of code without a hitch and that code working beautifully, Claude’s most advanced model Sonnet 3.5 (as of now) struggles to do this, though to be fair it still does a great job in spite of the smaller context window, almost on par with o1 from my own totally non-professional tests!

I think I’ll just be a bit more careful with such wording in the future, even though 4.0 doesn’t have this level of censorship. But on that note, has anyone tried writing with o1 preview yet? I use Claude Sonnet as my go to for most things, especially writing stories and what not, since I wasn’t too impressed with ChatGPT 4’s more sterilized approach to prose and plot staging, and am a little afraid now to have it help me with my writing and then ban me out of the blue lol

Topic		Replies	Views
Has anyone noticed GPT4o quality drop last few days? Feedback	86	6202	January 8, 2025
Launching o3-mini in the API Announcements	61	23097	February 10, 2025
Announcing GPT-4o in the API! Announcements	130	108035	July 4, 2024
O1 not as good as o1-preview for problem solving Community chatgpt	33	3137	January 13, 2025
Assistants API Pricing and Token Usage API api , pricing	104	32440	February 27, 2024

New reasoning models: OpenAI o1-preview and o1-mini

I am not your reasoning, AI!

Related topics