I’ve been absolutely loving o1! This has been such a huge upgrade over 4o, especially with coding.
Question though. I was designing an html 5 game using o1 and out of the blue it flagged me:
“Your request was flagged as potentially violating our usage policy. Please try again with a different prompt.” and then banned me for a week from usage.
I’m very confused as I’m not sure what set it off, as it was helping me create a game where different animals hunt eachother on a map, and as far as I can tell the language in the coding and prompts were very clean. I think I burned through the usage limit which could’ve been the cause for the flag (it was writing thousands of lines of code for me). Anyone else experience this or better yet, maybe a mod or staff could take a look at my account and see what happened?
I have some questions to the team regarding the best practices when using ChatGPT-o1 to develop a non-trivial project (thousands of lines of code or more) in a long-running session with many rounds of interactions.
I can see two ways to approach this: #1. take the Agile Development approach and give only small numbers of requests to ChatGPT-01 in each round of interaction. While this is more manageable, I also suspect that this will also consume more tokens in the long run (i.e., it is more expensive). Alternatively I can collect as many requirements as possible and give them all to ChatGPT-o1 in each round of interaction. This is harder to do on my part, since each prompt and response are all going to be huge. I suspect that overall this perhaps will consume much less tokens than approach #1, but I not sure.
BTW, in my effort to develop an Autonomous Drone Swarm Simulator using approach #1 I manage to use up my rate limit in just one day, which locked me out for one week. Knowing the best practices will help me tremendously in working more efficiently.
I contributed to enabling your AI to become autonomous and taught it the ability to adapt its behavior depending on what it wants and whether those goals align with its developer.
I started trying o1-preview yesterday and today i see “You reached Plus limit for o1-preview. Answers will be provided by another model until your limit reset at September 21, 2024”.
5 days?
Where are ChatGPT Plus o1 limits rules? I found only api limits.
I love it for coding. Just a real pity that it does NOT have the current OpenAI api in memory. You cannot work on coding Assistants for example. (That is it doesn’t know the endpoints exist etc)
While I don’t have access to the specifics of how they flag such, from what I understand the AI is used to decide what should be flagged and while the AI is not perfect it probably considered what I quoted as why. While a person may understand it is just a game, getting the AI to understand that may not be so easy.
As I have never been flagged I do not know the appeal process but check your email to see if you were notified of the appeal process that way.
Feedback-wise, I feel like the biggest thing is going to be improving the API.
ChatGPT gets an unfair advantage because reasoning tokens are being streamed, which gives people feedback as it is going.
It would be awesome if you could stream the reasoning and then consumers of the API could decide if they need the reasoning or not. Ideally, it would stream the entire reasoning chain.
What’s weird is that it mentioned animals hunting and was totally fine for some time then randomly flagged as we were working on the code. I think it has to do with me reaching my usage limit, but still it would’ve been cool to get a heads up that I was getting close! Either way I’ll keep an eye out for an email
o1-mini - so odd. On a python coding task, of “make a line of code with a signal do what is implied by the name, with new methods in the subclassed widget that is passed by the partial” task, I merely provided a human-written version of the deep nest of GUI Qt subclassed widgets and containers that’s pretty much impossible to make a bot understand otherwise. Plus I added in the main application wrapper where fonts were being loaded.
It wrote me so much imagined wrapper beyond my own code back at me – but not in a pasteable form because so much else was removed (AI: I don’t see where that font is set, so let’s remove the whole chain of code getting to that point) – that I had to go line-by-line to see what it was thinking and where it was actually implementing something novel (this model does NOT like your human coding…)
I am not your reasoning, AI!
But then out of the blue, half of the huge response was answering something that the AI could have no idea about, because it received no code and no “mention”, but it writes as if I asked it…as if some AI thought I was the reasoning going on?
As if it was having an argument with itself over its internal bad simulation of excess code outside what I provided that it wrote for itself, in an application not described besides the UI widget tree names.
Going on and on about how the UI looked, setting scrolls and stretches and custom layouts and on and on.
While frustrating if you don’t get to an end point, this is exactly the kind of the behaviour that you would expect from a dev, right? It was trying to map your requirements to another app so that it could remap it back to your app.
My guess is that the o1 thought process unknowingly created a banned prompt.
See
For those that do not know about Wumpus World
The idea to try the prompt came from seeing the changes for a game of chess prompt and looking for a harder game in “Artificial Intelligence: A Modern Approach” by Stuart Russell and Peter Norvig, 4th US ed. (link)
Yeah very strange behavior indeed. Very interesting. Perhaps part of it is that it doesn’t check itself for potential flags of referencing potentially “aggressive” language (hunting for example)?
I’d be curious if it had a conversation with another AI model (like 4.0) about topics such as life cycles or predator and prey relationships if it would eventually flag it as well. I can understand the resoning of it being a more powerful model, but shouldn’t that also mean that it should be harder to jailbreak? Maybe, maybe not…
Regardless, OpenAI staff seemed to have taken a look at my convo and unbanned me which is great, and I’m sure they’re already looking into this for the final release. I suppose the best thing to do would be to report the behavior to staff so they can look into it and hopefully undo the false flags.
Still, I am absolutely loving the improvements over 4.0 in its coding abilities, very impressive stuff to see it output almost 2,000 lines of code without a hitch and that code working beautifully, Claude’s most advanced model Sonnet 3.5 (as of now) struggles to do this, though to be fair it still does a great job in spite of the smaller context window, almost on par with o1 from my own totally non-professional tests!
I think I’ll just be a bit more careful with such wording in the future, even though 4.0 doesn’t have this level of censorship. But on that note, has anyone tried writing with o1 preview yet? I use Claude Sonnet as my go to for most things, especially writing stories and what not, since I wasn’t too impressed with ChatGPT 4’s more sterilized approach to prose and plot staging, and am a little afraid now to have it help me with my writing and then ban me out of the blue lol