I’m using the realtime api where I have set the input_audio_transcription field as I require the transcriptions from user audio as well. I’m facing issue where the transcriptions returned by the api are incorrect sometimes however, the model seems to get the correct context and generate correct responses. The transcriptions frequently switch languages even though I’m using English only. I have tried specifying other models from transcriptions as well - whisper, gpt-4-mini-transcribe but still same issue.
My question is how do I improve the transcriptio9n accuracy? I should atleast have the option to set the desired language for transcribing to english to enforce the api to generate english transcriptions only.