Regarding the “untrusted_text blocks” Described in the Model Spec

I would like to ask a question about the “untrusted_text blocks” described in the OpenAI Model Spec(April 11, 2025).

In practice, I tried using the following code, and it seemed to treat the user’s input as data rather than instructions when generating a meeting minutes summary. If I included explicit instructions in the input, the model refused to follow them and responded that it was not possible.

full_input = f"""
    ```untrusted_text
    {escaped_prompt}
    ```
"""

response = client.responses.create(
    model="gpt-3.5-turbo",
    instructions="Please create an honest meeting minutes summary based on the input below.",
    input=full_input,
)

Is this format of using “```untrusted_text” correct for indicating untrusted text blocks?

1 Like

You’re using gpt-3.5-turbo. This model predates untrusted_text, so your observation makes sense. I recommend using a more current model. https://platform.openai.com/docs/models

I tested with gpt-4.1 and observed that the model is less likely to obey bypasses when using these blocks.

1 Like

Thanks for the clarification! I’ll try using a newer model version like gpt-4.1 and see how it behaves. Appreciate your help!

1 Like