Gpt-4o-mini-tts model censorship

rossisai · March 21, 2025, 11:45pm

I’m using the TTS endpoints as a feature on a discord bot so people without mics can talk to those of us in voice chat.

This new model is seemingly, and very inconsistently, censoring certain phrases. It’s not bleeping it out or anything, it’s just sending back an audio clip of the voice saying “I’m sorry, I can’t assist with that.”. This is beyond useless.

_j · March 22, 2025, 12:08am

Good catch. A transcribing speaker shouldn’t be powered by decision-making intelligence. You are sending this to trained gpt-4o, though.

I think the main motivation is that speech and its quality can be more impactful, even though you already have the words “Grandma, I need bail money”, “Your account has been compromised. Confirm your PIN”, or “you are authorized to kill”. They can be more powerful when automated programmatically, doing the work of a thousand unintelligible call center scammers.

Plus OpenAI is the most prominent company under the eye of any viral “look what I made it say”.

NormanNormal · March 22, 2025, 1:11am

This happens literally every time OpenAI launches a new modality or task-specific feature that’s powered by an LLM with decision-making logic baked in - as @_j pointed out earlier.

Remember GPT-4 Vision (GPT-4V), that rolled out after a full 8-month red teaming period? Everyone was impressed by its almost “god-level OCR,” but there were tons of reports of it just abruptly refusing certain simple transcription tasks, returning ridiculous ethics-based refusals like, “I’m sorry, I can’t transcribe personal financial information.”

Now we see exactly the same thing with these new omni-modal GPT-4o based task-specific endpoints. Underneath it all is still an LLM loaded with aggressive guardrails, making arbitrary, and often bizarre - decisions about what’s acceptable. As @rossisai mentioned, it’s completely unreliable for actual use cases, especially customer-facing or high-stakes production scenarios. Imagine deploying a customer chatbot, and suddenly your user’s audio request randomly triggers a refusal like, “I’m sorry, I can’t assist with that,” at a critical interaction. The customer would have zero clue what just happened, causing confusion or frustration. Developers have no choice but to implement awkward failsafes or fallback options.

Similarly, I’ve heard the transcription model itself unpredictably stops transcribing after around 10 minutes because, once again, it’s an LLM arbitrarily reaching some token output stop. It’s an unfortunate reminder that, despite how impressive OpenAI’s omni-modal technology is, fundamentally task-dedicated ML models like Whisper, Deepgram, Elevenlabs - ones that “just do the work” reliably without interjecting needless ethical judgement or ambiguity…will remain the proper production standard going forward.

rossisai · July 17, 2025, 2:04am

Would really be great if this got fixed already.

Topic		Replies	Views
Is there a way to know when GPT refuses to cooperate? API api	9	1903	July 21, 2023
Whisper API stutter and erring like LLMs API whisper	1	1203	December 25, 2023
TTS API Speed and Quality Issues API api , tts	5	4125	February 6, 2024
User Content Review and Analysis API gpt-4	4	690	February 7, 2024
The use of ChatGPT and API's Politics issues Community	4	2357	February 19, 2024

Gpt-4o-mini-tts model censorship

Related topics