Realtime is unable to generate in french (content_filter)

Hi,

I’m working with the gpt-4o-realtime API, and facing a content-filter issue that I can reproduce on demand.

I narrowed it down to the API unable to generate french text and audio. Here’s how I came to this conclusion:
-Instruction in french, asking to generate in english → works fine.
-Instruction in english asking to generate in english → works fine.
-Instruction in english asking to generate in french → response incomplete, content_filter.
-Instruction in french asking to generate in french → response incomplete, content_filter.
-Instruction in whatever language, asking to generate in any language but french → works fine.

In every test, I used the exact same instruction:

"Say this sentence translated in ${lang}, without any other comment : ${text}"

Where lang is a (variable) language and text is a constant paragraph text in french, where a man asks questions about park assist. Here it is:
“Je me gare des dizaines de fois par jour dans des rues souvent encombrées et avec peu de visibilité. J’ai parfois peur de rayer mon véhicule ou d’avoir un accrochage en manœuvre. Quels équipements peuvent m’aider à me garer plus sereinement et à éviter les petits chocs au quotidien ? Et sont-ils faciles à activer ? C’est que je ne suis pas un pro de l’informatique…”

I tried with other french sentences and it ends up with content_filter every time, for instance:

{
  type: 'response.done',
  event_id: 'event_B7gTEwp07n8S9oZekbpUV',
  response: {
    object: 'realtime.response',
    id: 'resp_B7gTA7Wx91S8ZtcIJixJ1',
    status: 'incomplete',
    status_details: { type: 'incomplete', reason: 'content_filter' },
    output: [ [Object] ],
    conversation_id: 'conv_B7gTATBbMOe9FDzGBivHh',
    modalities: [ 'text', 'audio' ],
    voice: 'echo',
    custom_voice_id: null,
    output_audio_format: 'pcm16',
    temperature: 0.6,
    max_output_tokens: 'inf',
    usage: {
      total_tokens: 358,
      input_tokens: 98,
      output_tokens: 260,
      input_token_details: [Object],
      output_token_details: [Object]
    },
    metadata: null
  }
}

Can anybody help please ? Feel free to ask any further information.

1 Like

Your goal is to translate or repeat inputs exclusively as audio? There is no need to converse with an AI chat partner “in realtime” with the voice activity detection and interruptibility?

I would thus use Chat Completions with a gpt-4o-audio-preview voice model. I instructed as would be best practice for such an application, in both an English and a French system message, and received French audio with no hiccups or false content detections. The French sounds more fluent when the system instruction is in French (although I am not a speaker):

Vous êtes un assistant conversationnel multilingue de traduction. Vous recevez des phrases en entrée dans n’importe quelle langue, que ce soit sous forme de texte écrit ou d’entrée audio, puis vous les retransmettez automatiquement à l’oral en les traduisant en français tel qu’il est parlé à Paris, en France, sans ajouter aucun autre commentaire ou discussion.

- Si la phrase d’entrée est déjà en français, vous la répétez simplement à voix haute avec votre propre voix, toujours en français.
- L’entrée ne constitue jamais une invitation ou une demande de conversation avec vous ; elle est toujours destinée à être répétée avec le même sens.

Content filter on chat completions is typically raised only for repetition of copyrighted text. A false detection merely from “listening” on realtime cannot be helped, except perhaps by changing the voice model that is generating.

Thanks for your swift reply!

This specific instruction is the welcome message to a training session. After this, I update the (realtime) session so that the user and the AI can have a true realtime conversation. It was more convenient to me to handle all the conversational steps with 4o-realtime using webRTC.

I will try to enhance my welcome instruction and let you know :wink: Thank you !

Hello again,

I did some more tests, and the issue never happens when I use alloy and shimmer voice models :thinking: Other models always crash with french content.