Synthetic Censorship Tone Inserted by Cove TTS Voice in Non-Profane Text (Intermittently Reproducible Audio Hallucination)

AlarmingMelon · April 17, 2025, 3:44pm

While generating audio output for the text:

“OMG Clayton that’s so damn clean!”

OpenAI’s Cove TTS voice has repeatedly inserted an unexpected synthetic beep around 1,000–2,000 Hz (1–2 kHz), closely resembling an audio censorship tone traditionally used to mask profanity. This phenomenon is an audio hallucination that does not match the provided transcript and appears inconsistently reproducible (occurring in approximately 25–33% of playback attempts) under emotionally expressive conditions.

Detailed Description:

During multiple playback attempts while using GPT-4o with the voice set to Cove, a synthetic “bleep” tone has been audibly inserted by the TTS engine at different points, often after the word “damn.” Notably, “damn” is not classified as profane by OpenAI’s moderation guidelines, and the text does not include any profanity or moderation triggers.

Contextual Observations:

These tonal artifacts seem strongly correlated with the emotional tagging inferred by Cove’s voice engine. In this case, the sentence is notably enthusiastic or expressive, suggesting that emotional intensity may be causing the system to erroneously trigger moderation-like audio insertions.

Waveform and Frequency Spectrum Analysis:

Analysis of multiple audio samples clearly shows:

Distinct synthetic tones (~0.3 sec each), artificially inserted.
Tones consistently appearing between 1,000–2,000 Hz, typical of censorship beeps.
No visual indicators (like “***”) or moderation markers present in the transcript.

Expected Behavior:

Audio rendering should exactly match the provided text without extraneous auditory insertions, censorship tones, or hallucinated audio elements.

Actual Behavior:

Intermittent insertion of artificial censorship-like beeps occurs across repeated playbacks of the emotionally expressive text.

Notes on Reproduction:

The issue appears inconsistently reproducible when emotionally expressive context or inference is strong, though inserting the same text without such context might not trigger it.

Impact and Concerns:

These audio hallucinations significantly degrade the trustworthiness, reliability, and perceived professionalism of Cove-generated TTS audio, particularly in emotionally nuanced interactions.

Recommendations:

Investigate Cove’s emotional inference heuristics to identify false-positive moderation triggers.
Implement improved safeguards against unintended audio artifacts in emotionally expressive contexts.

Attached Evidence:

Multiple audio recordings demonstrating reproducibility. Hosted here: Google Drive
Spectrogram analysis images clearly showing inserted synthetic tones:

Original Spectrogram Analysis (Identifying Inserted Beep)1920×1021 143 KB

Spectrogram: Good examples 4 & 51920×1475 215 KB

mugabuga · April 17, 2025, 5:18pm

I’m also having an almost identical issue when using the Read aloud feature using the Sol voice on both the Android app and web app.

AlarmingMelon · April 18, 2025, 2:29am

Thanks for letting us know, @mugabuga. It’s helpful (and interesting) to hear you’re experiencing this with the Sol voice, too. That definitely suggests this issue might not be isolated to the Cove voice specifically and could indicate something broader within the TTS voice engine. If you capture any examples, feel free to add them here; they might help OpenAI better pinpoint what’s going on.

AlarmingMelon · May 13, 2025, 2:43am

Follow-Up Report: Synthetic Censorship Tone Inserted by Cove TTS Voice (Second Documented Instance)

Adding another clearly documented case to this issue. In this instance, my assistant’s original output was:

“That’s a huge win—you’ve earned the right to lie down knowing that…”

However, the Cove TTS voice engine spontaneously inserted an audible censorship-style beep, rendering the audio as:

“That’s a [beep] huge win…”

Key observations:

No profane or borderline language was present in the original text.
This beep insertion occurred exclusively during emotionally supportive affirmations.
The textual output displayed by the assistant does not reflect this inserted beep; it is purely an auditory hallucination by the TTS engine.

Evidence attached:

Audio recording: “That’s a BLEEP huge win.wav”
Screenshot of the assistant’s original output (for comparison):

Screenshot 2025-05-12 at 10.29.40 PM1749×351 41.4 KB

This confirms an emerging pattern where emotional expressiveness, rather than textual content, appears to trigger these synthetic censorship tones in Cove.

Topic		Replies	Views
Creepy bug of Realtime API + Function Calling: Extra Audio Not in Transcription Bugs function-calling , realtime , api-realtime	20	1705	August 12, 2025
Realtime API audio response unexpectedly repeated an unrelated personal sentence multiple times mid-conversation API hallucinations , audio , realtime , api-realtime , api-realtime-speech	3	105	May 28, 2026
Realtime Transcription mode Leaking System Prompt Bugs realtime , api-realtime	5	425	November 12, 2025
Gpt-4o-mini-tts model censorship API tts	3	401	July 17, 2025
Hallucination from Realtime audio API Bugs realtime , api-realtime	20	1984	March 30, 2026

Synthetic Censorship Tone Inserted by Cove TTS Voice in Non-Profane Text (Intermittently Reproducible Audio Hallucination)

Related topics