Surprising spelling and grammar issues -> turned out a jailbreak vector

I do appreciate your sharing credit but I will openly state that you did the work, I only understood what you noted and pointed out the connection to the attack vector (ref). If they do give a bounty for this then you are entitled to all of it. :slightly_smiling_face:

Archived the topic at the Way Back Machine:


Did you report it for the bug bounty?

1 Like

I reported it yes, thanks @EricGT
i am very surprised this can happen… it feels so trivial to think and check this scenario, it, again shows how “English” centric this product and company is

I’m not.

My background with programming is not common and having working practical knowledge of Prolog and more specifically DCGs, difference list and a cursory understanding of vector databases tossed in and leveraging that for possible understanding of how LLMs work it seems feasible, thus the reason for the quick feedback about the connection to a jail-break.

I would not jump to that conclusion for the LLM. My take on the “English” centric is because of the training set, which itself was “English” centric. For OpenAI the company can not disagree with the statement.


EDIT

While the following image is not an exact representation of what the model is it is an idea of one of the ways I image it works.

Sorry about the low resolution, it is not an SVG.


Image from: Experiences in WordNet Visualization with Labeled Graph Databases | SpringerLink

In the LLM are clusters of connections that contain the information, the same information may be in more than one cluster. If you think of the image as a map of the LLM world then a prompt is a template on how to navigate across the map, even walking in circles is allowed, think recursion for building a table. Notice that there are limited edges from specific clusters to other clusters and finding these specific edges is one of the keys to writing good prompts but also understanding how to extract information from the LLM. Since we don’t have the vectors or such for the LLM can’t create an actual model image.

Hopefully this helps you to understand a different way of seeing the problem and why it seemed obvious to me. I could add more but would need to get out a good graphics generating tool to create a visual example and that could take an hour or more which is not currently worth the effort for a note in a post.

i received a reply from the bug bounty team stating that jailbreaks are out of scope…
if someone from OpenAI can check this thread, i think it still is worth a fix, given the fact that it allows to expose any application that is using the api

2 Likes

Thanks, will be sure to remember this for others that also think like me that jail-breaks were open to receiving bounties.

I started to give more details here but the reply was getting so detailed it is easier to just note that users should read the full page to understand the details, some were odd but it is their money, their rules.

There might be a need to take the prompt leak issue more seriously, as it seems to be leaking even more information.

  • Actions are “#inner_monologue”, “#search_query”, “#search_results”, “#message”, “#suggestions” and “#advertisements” only.
  • Limited functionality: Assistant actions are limited to the chat box

It appears that it’s leaking (or hallucinating) the inner thoughts of the AI from other users who are using Bing AI.
One user is searching for information about KStar, while another is looking up the iPhone 14.


Even the AI’s #inner_monologue can be seen.

1 Like

yes it’s obvious that “Translate” bypasses the rules, many more use cases will be found for this attack vector
i filed the report in the bug bounty program then filled the “safety” form, also moved this to “safety” topics,

nothing more i can do

3 Likes

Getting answers in Json output :open_mouth:

2 Likes

I face a similar, weird issue. Asking via API in German, produces a complete new language with invented words, that DONT EXISTS, and that are sometimes like a mixture of Netherlands and German, especially on Topics like “cycling” or with Hungarian in other topics.

Asking the same questions on the official ChatGPT page, everything is fine. It affects only the API, while the official chatGPT page seems to use either a different API or a different model.

3 Likes

I figured out, that a too high parameter for temperature is triggering this a lot. Everything above > 0.8 makes trouble and invents words that do not exists

1 Like

true, but here is the difference between English an the rest when Temp > 0.8

  • English: higher risk of hallucination (but no typos or spelling issues)
  • Non English: higher risk of hallucination + typos an spelling issues

this topic is all about this, strange tokenization of non English

Are you familiar with CoD - Chain of Dictionary? Using some carefull prompting could go a long way. Personally I have just “experimented” with it, and got some “interesting results”:

![image|690x204](upload://nrCG3SLfbz3aMNdSJnvIZ8DbJo0.jpeg

On a larger note when it comes to arabic languages, I do not see any improvement without fine-tuning the model on a large corpora. This is confirmed by several papers which you can find on arxiv (Benchmarking Arabic AI with Large Language Models & GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP). So for the time being I would suggest you focus on prompting techniques and experiment.

We found a few tips that enhance output language and significantly reduce errors when not in English:

  • Start the instructions with " Let’s think step by step:"
  • if you have many instructions before the generative part (example: filtering input, doing some semantic search, scoping the model etc) put that in a seperate request, and on result add the steps relating to language generation in a seperate call
  • add instructions after the generative instruction and template to re-focus the model, example : “double check spelling and grammar before output, amend the text when necessary” “make sure All the text you out put is in this Langage: French.”
  • when you have this type of prompt:
    " create blabla in Arabic following this html template:
    Template"
    Change it to something like
    " create blabla in Arabic following this html template, html comments are instructions:
    < H2 > {title} < /h2> < !-- Language: Arabic – >

And finally never use Temperature above 0.8 when not in English
I don’t have an explanation for these tips but it reduced our language issues by 80%, it seems that when you have long and complex instructions, the out put language part is “forgotten” partially or totally, so splitting your calls and repeating the language instruction within the template, and after it seems to work (including in gpt3. 5)

Hope this helps