For example, the following is from transcribed audio. I am finding this to be pretty common, this bug has happened multiple times.
二呢我非常爱读书你很爱读书不仅是语言比如国际关系还是做生意金融我本科其实读的是工程工程真的哪方面的工程工业工程工业的工程所以跟做生意有关系对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对对
Increase temperature slightly (e.g. 0.8 → 1.0).
Apply a repetition penalty (e.g. 1.1–1.3).
Use top-p sampling instead of greedy decoding.
Post-process with a regex to collapse repeated monosyllables (“对对对对” → “对对”).
from openai import OpenAI
import re
# Initialize the OpenAI client
client = OpenAI()
def clean_repetitions(text: str) -> str:
"""
Remove excessive repeated Chinese characters or punctuation.
Example: "对对对对对,我很喜欢书书书!!" -> "对对,我很喜欢书!!"
"""
# Collapse 3+ identical Chinese characters to 2
text = re.sub(r'([\u4e00-\u9fff])\1{2,}', r'\1\1', text)
# Collapse 2+ identical punctuation marks
text = re.sub(r'(。|,|!|?|~)\1+', r'\1', text)
return text.strip()
def translate_to_chinese(prompt: str) -> str:
"""
Translate English text into Chinese using GPT-5,
with decoding parameters to reduce repetition.
"""
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You are a precise translator that outputs natural, professional Chinese."},
{"role": "user", "content": f"Translate into Chinese: {prompt}"}
],
temperature=0.8, # Adds diversity, prevents deterministic loops
top_p=0.9, # Nucleus sampling
presence_penalty=0.6, # Slight penalty for repetition of ideas
frequency_penalty=0.8 # Penalize repeated tokens
)
raw_text = response.choices[0].message.content
return clean_repetitions(raw_text)
if __name__ == "__main__":
english_text = "I studied industrial engineering because I like business and finance."
chinese_translation = translate_to_chinese(english_text)
print("Original:", english_text)
print("Chinese (cleaned):", chinese_translation)
2 Likes
There is a guy called That and we wanted to talk about a specific word in a sentence: “that”.
So I said to That that that that that was that we wanted to talk about.
And the same applies to chinese “对”. It just has many meanings - for example if you wanted to translate the englisch yeah, hmm yah yes yes yes - that could lead to 对对对对对 and then the next token would likely be another 对.
Humans have problems with that kind of stuff too. Some also make that that that that that haha loop then.
or maybe it was a machine in the background that made duiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduiduidui and the STT got it wrong (speaker diarization might help or noise canceling or hybrid).
Actually you could try to add a small audio chunk to the end of each of your audio chunks - like in radio communication they use the word “over” - and then filter that out in the translation.
You should also put that into the prompt. Like "You are a helpful audio translator from english to chinese for radio transmitions. Each chunk ends with “over” - don’t translate that.
Step 1 - generate a short “end word” snippet
You can pre-record a voice saying “结束” or use a simple TTS file:
say -v Ting-Ting "结束" -o end.wav
(or any short noise/beep clip)
Step 2 - append it automatically to each audio
from pydub import AudioSegment
def append_guard_word(audio_path, guard_path="end.wav"):
audio = AudioSegment.from_file(audio_path)
guard = AudioSegment.from_file(guard_path)
combined = audio + AudioSegment.silent(duration=250) + guard
temp_path = "temp_with_guard.wav"
combined.export(temp_path, format="wav")
return temp_path
Step 3 - transcribe and filter out the guard word
transcript = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=open(temp_path, "rb"),
response_format="text"
).strip()
# remove trailing “结束” or “endword”
transcript = re.sub(r'(结束|END|the end)$', '', transcript)
2 Likes
Hello! Are you able to reproduce this character repetition issue consistently? If so, would you mind sharing:
- approximate start/end timestamps
- request ID or API response ID if available
- which model/version was used (
gpt-4o-transcribe, gpt-4o-mini-transcribe, or -diarize) - was the request streaming or non-streaming?
- how long was the audio and were there any trailing noises or silences?
And thank you for the help, @jochenschultz! Appreciate it
2 Likes