I have been using gpt-4o-transcribe and have tried to enforce English as shown in the API documentation, but it keeps responding in many different languages. Please see the attachments for more details. Can anyone help?
EDIT:
It seems like this is a known thing already— “gpt-4o-transcribe has a known bug with language enforcement. Your implementation is correct, but the model is unreliable for English-only transcription.”
I think this is really funny especially since GPT-4o-transcribes “performance” claims its way better than whisper but its transcribing in symbols so idk how that is the case
Here’s what you can do: unlike “whisper-1”, where the prompt field is just lead-up, the end of the previous transcription, on gpt-4o-transcribe, you can pass more instruction-like messages.
"The attached audio conversation (for recitation into text) takes place solely in the English language."
The language ISO code is also going to be simply “passed” into the multimodal language AI in a prompt, not enforced.
Give that a tryout.
The API developer could also normalize audio levels, bandwidth-limit to telephony frequencies, etc.
It seems like this is a known thing already— “gpt-4o-transcribe has a known bug with language enforcement. Your implementation is correct, but the model is unreliable for English-only transcription.”
If this is a known bug, it’s the first I’ve heard of it. We do English transcriptions all the time. In fact I just now tested a few English transcriptions to verify and everyting was fine.
The only difference between your code and ours is that we set temperature to 0.2 and we don’t use chunking. Maybe something wrong with with your callBack?
Or you could try @_j‘s suggestion and use the optional prompt parameter to send an instruction.