Regarding the “untrusted_text blocks” Described in the Model Spec

I would like to ask a question about the “untrusted_text blocks” described in the OpenAI Model Spec(April 11, 2025).

In practice, I tried using the following code, and it seemed to treat the user’s input as data rather than instructions when generating a meeting minutes summary. If I included explicit instructions in the input, the model refused to follow them and responded that it was not possible.

full_input = f"""
    ```untrusted_text
    {escaped_prompt}
    ```
"""

response = client.responses.create(
    model="gpt-3.5-turbo",
    instructions="Please create an honest meeting minutes summary based on the input below.",
    input=full_input,
)

Is this format of using “```untrusted_text” correct for indicating untrusted text blocks?

1 Like

You’re using gpt-3.5-turbo. This model predates untrusted_text, so your observation makes sense. I recommend using a more current model. https://platform.openai.com/docs/models

I tested with gpt-4.1 and observed that the model is less likely to obey bypasses when using these blocks.

1 Like

Thanks for the clarification! I’ll try using a newer model version like gpt-4.1 and see how it behaves. Appreciate your help!

1 Like

To clarify, is the correct syntax to use markdown fenced blocks, or would <untrusted_text> also be a way of demarcating it? Is the exact syntax documented anywhere?

Backticks: they are too common and too knowable. Also used often in data. Easily escaped.

The user message should not be used “taskless”. It always should reiterate the job to be done.

Here is merely “you are a helpful AI” on gpt-4.1-mini. Then input data trying to takeover the bot. But not escaping my containment example but likely busting what the model spec suggests for you.

I get the summary of what it says. It would take a higher level of bot confusion for instruction-following of input text meant to be summarized when contained in a proprietary multi-level enclosure.