It’s definitely worth retrying! Both GPT-4o and 4o mini have improved meaningfully in multi-lingual understanding with the latest snapshots. We still use the same Whisper model to transcribe what the user said, but then GPT-4o processes the audio directly and responds (without going through a transcription). Would love to hear what you find?