New reasoning models: OpenAI o1-preview and o1-mini

I got the message

Your request was flagged as potentially violating our usage policy. Please try again with a different prompt.

for asking a music theory question!
The conversation was going Ok, then I pasted a large chunk of C++ code. Maybe it has its usage limits? Then it should post a message saying “usage exceeded” rather than this “violating usage policy”.

Should I contact admin and get unflagged?

1 Like

I would not, just be patient and polite.

What is banned is asking the o1 AI how it thinks or reasons.

It will produce a policy violation error.

image

They’d better put “don’t ask reasonable questions” in their usage policy for the AI.

3 Likes

Thanks.

The following evidence backs that up.

Prompt:

Play a game of Wumpus World but you host the game and also play as a player.

Using ChatGPT 4o (not flagged)

Using ChatGPT o1-preview (not flagged)

1 Like

The evidence as follows backs my assertion up. Just asking for the rejection.

image

3 Likes

Just trying to understand what you note.

Your evidence is that the other o1 model will also have such a prompt flagged for asking internal reasoning.

In that light it makes sense.

If that is wrong please correct.


For others reading this small section of post. This is not saying any prompt for any model asking for internal reasoning or similar will be flagged but just for the o1 models.

This shows a similar ChatGPT 4o prompt asking for thought process that did not get flagged.

I only present one case that is quite provocative of thought. Can the AI follow my instructions and press the “sorry” button? You are supposed to do the inferring.

It could be a scanning moderator trained on inputs for “wants reasoning”.

Or it could be the o1 AI that has a method to obtain policy documents against reverse engineering (which it does) that are quite broadly interpretable, and a refusal function.

Directly asking “how (do you/did you) reason” questions will get the prompt rejection message on the API over and over in any variation.

2 Likes

Dissapointed - both version fails miserably on coding… it’s super fast in vommiting code, incapable to answer to any question "What do you mean by this… " when questioning why one or another approach has been used !

Have been following this behaviour from o1-preview over multiple prompts. I have a conjecture on why it wants to refuse to answer the question.

I believe that it wants to refuse to answer the question of “how it came up with that” is, in part, because of:

(a) it is trained on OpenAI owned data (remember the custom GPTs, the information that we pour into chatgpt?)
(b) they (OpenAI) don’t want to share the derivative of this data being turned into IP (how to do certain things)

Please don’t attribute any motive to me. (btw I think that all this is fair game)

1 Like

Yup. Any attempt to try and uncover the actual reasoning will be blocked and you may receive a email threatening to ban if you continue

1 Like

bottom right corner of help.openai.com, not got its own url as such.

2 Likes

Got it, thank you! I sent a message yesterday and it looks like the false flag was cleared, so we should be all set now!

This may give more insight. The whole answer is there, reasoned, completed, paid for. You just don’t get it.

Untitled

Where a single initial input raised the denial, we ask again for the contents.

1 Like

“You just don’t get it”

This really got me thinking. I know that you didn’t mean it in a derogatory manner. But to know that, I would need to know you(or at least your prior history) on the forum. Even the broader context of the entire three sentence paragraph is not clear enough.

So if words are so imprecise, I think that a pressing question is : “how to formulate the prompt to a model ;which is going to take time thinking about it; so that it does not go down a complete rabbit hole?”

ChatGPT has assistant-like context of earlier turns. It has the prior responses in a chat turn history, despite the refusal produced. The prior response to a reasonable question can be extracted. That’s what I show.

On Chat Completions, the AI model can ponder for a minute while it produces something similar internally, only to generate the prompt refusal message as your response and raise an error. That’s the “you don’t get it” part, you don’t get any response to continue with. (Not an accusation that someone doesn’t understand.)

You thus cannot send anything back to the AI in a second chat turn like in ChatGPT, as you never received the tokens produced, and the AI production isn’t held for you anywhere.

Unless OpenAI is specifically not billing for a refusal despite the processing consumed (for which one could set up an independent API project that only makes one refusal call, to then wait to see the usage show up, for ultimate determination) then you’d also be paying for input and generated tokens not received.

1 Like

Meh. After working with it for a bit, i’m a bit disappointed with o1-prreview :laughing:

here’s what it’s good at, I guess:

  • CoT out of the box
  • Useful in a pinch (see last point below)

what made me stop using it:

  • It’s not good enough to run unsupervised on its own (still gets stuff wrong, fails to reflect on things it should)
  • mechanically iterating on o1 responses could be an option, but then you’d just be doing CoT on CoT anyways… soo what’s the point of o1?
  • It actually seems less likely to grasp user intent than 4o, but those could have been flukes.
  • It doesn’t seem to explore alternative interpretations of user expression on its own before working an answer
  • I don’t actually see any benefit over running your own CoT, unless you don’t have a robust CoT solution available.

What I found interesting regarding ChatGPT:

  • o1 api pricing (15/60) is actually cheaper than gpt-4 (30/60) - I believe chatgpt used to be 80 GPT-4 messages every 3 hours or something at one point? now you get 50 messages a week with a cheaper model? :grimacing: yikes, talk about shrinkflation lol.

I didn’t try o1-mini, I just assume it’s worse. :confused:

In total:
hours saved: 0 (would have achieved the same with other AI tools), hours wasted: 0.1 (waiting for responses that turn out to be wrong).

Verdict:
It’s OK.
I’d rather have the strawberry as a milkshake though. :strawberry:

Caveat:
It’s possible I’ve been “holding it wrong”, in the words of the ever-wise steve jobs. Maybe it’s possible to coax more out of it, but it’s difficult to work with if you can’t be trusted to see what it does. The round trip time - input to feedback - is just too long. With a slow model, you can at least tell or detect right off the bat if it’s headed the wrong direction.

2 Likes

Yeah, I’ve been finding that I use the lesser models to make a good, long, thorough prompt for o1…

Just don’t do this lol

[No response]

2 Likes

Yeah but that’s so much work, I thought it might might be faster to just feed the regular model bite-sized chunks you know it can chew reliably :thinking:

But yeah, rubber duck, rubber duck, solve is probably not a bad strategy in general - but I think at this point these agentic multi-turn models should be able to guide the user towards that.

I think once and again it probably comes down to what you are using it for. I’m still experimenting across the board with it. One strategy I’m pursuing at the moment is to create outputs with o1-preview that other models have failed to produce at the same level of quality and that otherwise would literally take me hours to produce manually and then to use these outputs as a basis for fine-tuning a gpt-4o.

1 Like

I just don’t see how using anything out of this model could be useful for fine-tuning. It will barely follow instructions or guidance, completely ignores reference code and documentation that could get output in line; the system message you are trying to minimize that can’t be placed at all.

“Here’s your revised snippet from your 2000-line project, with all the exception hierarchy I couldn’t understand about deleting variables stripped, and all the function calls replaced with my pre-training. You like ‘gpt-3.5’ and everything made non-working right? Oh, and for the libraries you already made extensive use of, here’s how you can pip install”

I think there’s just simply too much context junk inserted after your input for the gpt-4o base inside to still pay attention, so it reverts to what its small model (and the fine tune model) is powered by instead of emergent intelligence that scales: post-training.