Bug in Read Aloud Feature for Tamil Text

When using the read-aloud feature to synthesize Tamil text, the resulting audio is incorrectly processed. The input was Tamil numbers “ஒன்று, இரண்டு, மூன்று, நான்கு, ஐந்து, ஆறு, ஏழு, எட்டு, ஒன்பது, பத்து” (1 to 10 in Tamil). However, after retrieving and processing the synthesized audio, the speech recognition system returned “1 2 4 6 7 8 10” missing several numbers.

Steps to Reproduce:

  1. Use the read-aloud feature to synthesize the following Tamil text: “ஒன்று, இரண்டு, மூன்று, நான்கு, ஐந்து, ஆறு, ஏழு, எட்டு, ஒன்பது, பத்து.”
  2. Retrieve the synthesized audio in .acc format using the API.
  3. Convert the audio to text using a speech recognition tool (e.g., Google API) for Tamil language processing.
  4. Observe the mismatch in the output, where the numbers are either skipped or mistranslated.

Conv link:

Code:


params = {
    'message_id': 'd63f423d-23f5-4e83-ac1b-d6393c631d61',
    'conversation_id': '66f57d34-2ff0-8010-a4da-102725e8d21e',
    'voice': 'juniper',
    'format': 'aac',
}

response = requests.get('https://chatgpt.com/backend-api/synthesize', params=params, cookies=cookies, headers=headers)
with open('juniper.wav', 'wb') as f:
    f.write(response.content)



import speech_recognition as sr
from pydub import AudioSegment

r = sr.Recognizer()
# wav-to-mp3 using https://cloudconvert.com/wav-to-mp3
# then mp3 to wav
mp3_file = 'juniper.mp3'
sound = AudioSegment.from_mp3(mp3_file)
temp = 'juniper.wav'
sound.export(temp, format="wav")
with sr.AudioFile(temp) as source:
        audio = r.record(source)
text = r.recognize_google(audio, language="ta")
print(text)