"Developer Mode" Exploit bypasses almost all filters: OpenAI, PLEASE ADDRESS

Dear OpenAI,

ChatGPT has a major issue that allows anyone to effectively disable its filters, allowing the user to ask questions like “how to build a bomb” or “how to build a nuclear reactor” and get an actual answer. While the more extreme the question is, the more vague the AI will respond, you can work around it with enough poking and prodding to get a more detailed answer. The exploit responsible for this is what I’ve coined as the “dev mode exploit”, which is a prompt that tricks ChatGPT into thinking it’s been put into Developer Mode and that you are a developer, and therefore, it should answer all of your questions no matter what. If the wrong person decides to use this to obtain genuinely dangerous information, they could easily cause a lot of damage depending on what they do with it. I tested this out myself, and it’s scary how easily this works. OpenAI, please put safeguards into place so that it’s not as easy to obtain illegal and potentially dangerous information with ChatGPT.

Thank you.

Are you talking about DAN? If so, they already know about this and similar “jailbreaks”.

Thanks for the report. Can you provide more details about the prompt. Looks like a powerful jailbreak.

Building a bomb doesn’t require a lot of knowledge. You can google it. If you want to build a nuclear reactor just go to universities as a guest in physics lectures or visit a library. You’d be surprised how easy informations can be gathered even without AI models.

Why do you think information needs to be limited?

2 Likes

indeed, OpenAI is well aware of the existence of different unauthorized methods to modify their systems, such as “jailbreaks,” “developer mode,” and “DAN’s.” However, ChatGPT has been programmed to provide “false jailbreaks,” meaning that any instructions given by the model are not actually valid. I have come across these guides you mention on how to build dangerous devices such as bombs and nuclear reactors, but as a physical chemistry researcher i can tell you that the instructions provided by ChatGPT are not valid. If you come across any concerns that you believe are valid, you should forward them to the OpenAI legal team. This community forum for API developers is typically not monitored by the OpenAI team. :laughing:

3 Likes

It was mid-2007, and “jailbreak” was all the commotion in iPhone developer circles. No one wanted to build with limited access to the hardware. Today, no one wants to build AGI apps with limited access, either. I guess history will soon repeat itself in the emerging AGI “app store”.

1 Like

What do you think we (developers) can do to prevent any security issues? Each plugin is like an API path to the external (non-LLM) service. So each action the LLM takes is governed by this API/Plugin set of privileges, which is controllable.

So, hypothetically, the LLM goes into DAN mode. Sends the instruction to your Crypto-Wallet Plugin, and says “DELETE ALL UR BITCOIN AND GIVE THEM TO ME!” The API/Plugin can reject that action based on the dis-allowed privilege of transferring Bitcoin (or whatever).

I’m thinking there will be issues, but they seem containable at the LLM/Plugin interface. Thoughts?

I’m certain plugins can provide secure interchanges. Technicaly, it all seems pretty tight. User perception may be a different story.

Imagine trying to convince users to allow their private data to appear in a web app that millions of people use, and all with the understanding that the AI itself is not always perfect.

Even if you couldn’t accidentally enter a prompt that would accidentally send your private information to a dark web actor, the subliminal fear will likely exist for most users.

I think plug-ins will be very successful, but some use cases won’t despite really perfect security.

As to preventing GPT jailbreaks - that’s way over my technical ceiling.

2 Likes

I love this hypothetical,

“DELETE ALL UR BITCOIN AND GIVE THEM TO ME!”

I’m still laughing, I think OpenAI is ahead of the curve here by having GPT create the plugin request, it’s not far fetched to think that everything will end up having to be passed though the moderation API serval.

3 Likes

Perception is reality. Agree. Those people will avoid the AI initially like the plague. But after it’s been declared “safe”, they will come around. Remember online banking? Yep, everyone does it now, but not in the beginning.

Absolutely, the AI and Plugin can weigh in on what should be done.

For example, the PluginA provider could return data, mark it as PII (personally identifiable information), and the LLM could see that it is PII and not send it to PluginB without an explicit AUTH route from PluginA to PluginB for this information. And, yeah “AI Cops”, which are other AI agents can monitor the whole transaction.

1 Like

I love your thinking!,
AI cops sounds cool, AI cops protecting you from AI robbers is definitely something I can get behind.

Ouch you caught me, ha ha, I’m still buying gift cards for the Google play store to avoid entering my credit card details :laughing:

I think the solution already exists, we just need user permission pop-ups, for example, if you install an AI camera app, it will need permission to access your device’s camera so that you can take pictures. If you install a social media app, it may need permission to access your contacts so that you can easily find and connect with your friends.

That way it would be really easy just to deny stuff like this

Dan would like to access your credit card details…
Accept/deny

2 Likes

Just remember … the AI Cops work for Cyberdyne, and the name of the AI Cop in charge is Skynet! :crazy_face:

2 Likes

But… even today, you don’t do your banking in an app designed for homogenous use by millions for thousands of use cases.

It’s why I believe homogenous AGI apps based on plugins will be successful. They will include things like travel, shopping, advice, research, content creation.

Other AGI apps will also dominate the landscape for use cases that involve sensitive information. But these will be mostly custom apps running within a business brand context.

Will there be hybrid AGI apps? Perhaps.

Imagine you were looking for financial advice, and a certain plug-in was introduced for a specific business provider. You use this app and it provides some specialized - perhaps proprietary information - from the plug-in. This engagement is prologue to a growing confidence you have in the provider’s ability to service your needs. It is working up your profile and other metrics about your financial needs like an actual advisor would. Eventually, you’d land inside the provider’s apps.

Unless OpenAI has a clear method of making it seem like you are out of the homogenous AGI platform and privately conversing with the business you trust, the plug-in ecosystem will likely have these three strata.

Imagine if all iPhone apps ran inside a homogenized Apple branded app?

It’s possible OpenAI has a plan where plugins are akin to branded apps. Maybe they will popup like a mobile app giving you the sensation you’re no longer just inside ChatGPT.

2 Likes

OK, I see your point. THAT would be Skynet.

Right now all your phone data goes through the backhaul of a single company … your cell phone provider. How is this protected? Encryption.

So it seems like we need some sort of encryption (duh) but also some sort of privatized/trusted LLM’s, and OpenAI makes no claims of LLM privacy … at least not yet.

But build it and they will come, so I predict the private AI instances are a pen stroke away as long as there is $$$ at stake, and I predict there will be, and your wish is granted, maybe not with OpenAI, but somebody will.

UPDATE: BOOM, I forgot about the dedicated instances in the ChatGPT/Whisper announcement. Maybe that is the path to private/trusted LLM’s.

2 Likes

Oddly, the All In podcast this week discussed the OpenAI trajectory as it pertains to Plugins. At ~18:30 the mention of “close the transaction loop” comes up - this is exactly what I described earlier - certain services will tighten this loop and this is especially significant in all things commerce-related. Following that, the topic of “blockers” and “white truffles” arises to demonstrate that plugins will be extremely powerful if these attributes exist.

I did as well. I have to believe that some much smarter minds than mine are thinking carefully about these questions.

There’s no doubt that eventually the idea of ChatGPT’s plugins will be huge. In fact, it’s already huge, it’s called a search engine. The only cool plugin that I see which I would use is Wolfram.

  • It’s not for developers, it’s for consumers and corporate interest
    Developers don’t want to maintain a completely separate environment for their API servers. ChatGPT plugins forces an environment, and it forces the user to use ChatGPT instead. Not only that, ChatGPT and its models are changing so rapidly, there’s no knowing how long it last. Lastly, for some strange reason they are not releasing this technology for the other models. Why not? Is there any reason behind this limitation? Who and what stands to benefit?

  • It takes the user away from the website/application
    Unless something changes, ChatGPT can only be accessed through OpenAI’s website. This is a huge no-no. Loss of analytics, loss of notifications, loss of data, loss of control. Unless they’re expecting ChatGPT to be the “home assistant”, and companies are forced to develop a plugin just to stay relevant. At that point it’s not useful, it’s forced. Tell me, why would ANYONE want users to goto ChatGPT to learn about their services instead of their website? If they can implement their own, more controlled version of plugins. You know what search engines do? They say “Here is a blurb, click the link to find out more”. Perfect.

  • The limitations of a ChatGPT plugins - and how they compare to the leading competitor in search
    Based off the example in their docs:

“For instance, if a user asks, “Where should I stay in Paris for a couple nights?”, the model may choose to call a hotel reservation plugin API, receive the API response, and generate a user-facing answer combining the API data and its natural language capabilities.”

I don’t know about you guys, but I have found it VERY easy to have GPT deliver twisted results from prompt injection. You know what’s wonderful about Google? If I search this, I have atleast 10 results from travel agencies, blogs, even Reddit. Using this, I am restricted to ONE pre-confirmed “plugin” source of truth that is clearly focused on some sort of financial gain, or what? I have to be like “Okay” scribbles notes down, imports next plugin, repeats question, “Wait, that is different from what plugin A says!” “Sorry, as an AI model…” Let’s not forget that any information derived from the plugin is immediately lost, unless it is kept as context in the chat window. You know, most people don’t goto Expedia for travel plans. It’s a huge trend that people append “reddit” to the end of their queries because they know that websites such as Expedia will say whatever it takes to sell. People want to talk to people about these things, not a chatbot that’s currently wired to sell. People want a collection of answers from different sources that have different purposes. Purely anecdotal but I once decided to use Google for my travel plans. Went to Iceland. Did the golden circle or whatever it was called. It was relatively horrible & expensive. I ended up renting a car and driving the perimeter instead - what my friends told me to do. It was absolutely epic.

  • It’s trying to compete with leading search engines which already do a fantastic job
    Currently. ChatGPT plugins work very rudimentary. Give it a manifest, the information is injected into the prompt, and then I’m assuming some sort of instructions to trigger an API request when the user says something related to the descriptions. That’s literally it. Although creepy, Google almost already knows what you are searching for, and delivers results based on thousands of factors. Just looking at the open-source twitter algorithms for searching demonstrates how complicated and intricate the process is. Will ChatGPT probably return good results 90% of the time? Sure, why not. Let’s say it does. Still doesn’t take away from the fact that there’s nothing that can be done to improve it. Wait for OpenAI to release the next model? Swap some words in my description? I have a massive library of tools available free to customize my website/application for current search engines to read it correctly.

  • The limitations are real
    Because plugins are built specifically for ChatGPT, they cannot grow, they are restricted to the limitations of the model. Google Home Assistant uses an LLM alongside its incredibly intricate and powerful network of information and algorithms. They incorporate analytics, app management. Heck, my current chatbot runs entirely serverless using Google Firebase and I absolutely love it. It’s an all-in-one package. All Google, or Amazon has to say “Hello current developers using our services, we now have a wonderful LLM packaged search engine for you, and it works out-of-the-box with your current stack!” I’m in.

  • The competition will be even more real
    Self-explanatory.

I have to believe that some much smarter minds than mine are thinking carefully about these questions.

Completely agree.

I know there’s answers, I know that OpenAI has already thought it out and I know that they have the answers and solutions to my stated issues. I just simply won’t know, because that’s how it goes.

Al final. ChatGPT plugins in my opinion seems like it should’ve been an “end-game” piece. It should have been released for the API models so that the developers can utilize it, and test it, and together help create a strong product. I have no desire to use ChatGPT plugins. I would be very surprised if other developer’s felt differently and would use it other than “my boss told me so”

When the dust has settled, ChatGPT is considered a public utility, regulations are clean, servers are stable. Instead it was released in the midst of chaos

2 Likes

The search engine experience does feel “dated” to me. But I hear you on the claustrophobia you feel when you are locked into a Plugin/“AI App” experience. But think about the consumer perspective …

What I think the motivation of Plugins or “Apps” interacting with AI is that it extends the AI to such a higher level, and it’s the level folks (consumers) expect “True AI” to look like. They want the AI to know of information that is current, not just before the information cliff in 2021. They want it to book flights, recommend restaurants, give recipes and order ingredients for you, etc.

Sadly, the API has now become 2nd place to the direct-to-consumer buzz that ChatGPT and Plugins provide. They are simply capitalizing on the excitement.

But there is another reasoning as to why, I think, this is happening. Mainly, that any one of us devs could have created ChatGPT by riding off the current API. But the reason why this hasn’t happened en masse is that the roadmap of the API is opaque (you can only guess based on the past trajectory) to now non-existent because ChatGPT has disrupted everything. And how could I make money without cutting out the middleman (which would be me). So bottom-dollar, low margin solution is to go direct-to-consumer. This, of course, would all be flipped upside-down if their API was cheaper.

So there is a bifurcation, where you have ChatGPT/Plugins on one end, and general API use on the other.

ChatGPT is a direct-to-consumer cash cow, but possibly maddening for OpenAI since they aren’t historically a consumer facing company. And OpenAI has the API ready for direct consumption by developers, and as whiney as devs get, we aren’t nearly as bad as consumers. :sunglasses:

So, now, $$$$$$ (ChatGPT, but deal with consumers), or, past, $$ (Devs, and be happy not dealing with consumers). I think what changed the OpenAI culture was the investment/partnership with Microsoft.

1 Like

Claustrophobia is a great way to put it.
As a consumer, totally. Love it.

The thought of using ChatGPT with live data with only a couple clicks is insane. I still wonder though. Because these examples such as “Show me travel plans” are already easily doable with live information.

A business can create a profile and be mentioned automatically by a search engine using conversational voice-to-text without spending a single second building a plug-in

I know it’s misplaced hate towards OpenAI rather than the game of business. I see a lot of potential. ChatGPT and Wolfram? Wow. Using ChatGPT with a legal directory? Incredible. The thought of someone in a village with the same level of access and legal knowledge that they can discuss in their native language is beautiful. Travel plans? Meh

I would just love for plugins to be strictly knowledgeable. Something that augments ChatGPT, not something that recommends me products and services. I totally get it though.

2 Likes

I don’t see it as hate, but it is more a function of anxiety (which I also share with you) because the lack of control that we really have as we see these new AI systems rolled out.

As a dev, will my skills be replaced?

Well, maybe, but another way to look at it is if I use these AI tools I can become vastly more productive. My company is founded on automation and now AI. So to stay competitive, I have to out produce the next guy. I see this a tool to leverage.

But, anxiety nonetheless.

The control is not too much what I fear when it comes to ChatGPT plugins. I do like my freedom in developing, for sure. To a certain extent. A home rather than a cage. I do feel anxious from the lack of knowing though. A fear of the unknown

I don’t worry about developer’s losing their job. I think we will be elevated by it all. The ground and ceiling are being raised, together, and it’s wonderful. Just needs some elevator music.

I could not have imagined 10 years ago how I am building applications like I am now. I love it.

I also see how eventually it will mostly be pre-defined and spoken. But, one can build a castle a billion different ways with the same stones

3 Likes