Whisper API - temperature and some more questions

I am using Whisper API to transcribe text, not only in English, but also in some other languages. Frequently, it is successful and returns good results. However, sometimes it just gets lost and provides a transcription that makes no sense.

For example, I provide audio in Croatian, and it returns some random English text, not even translated, some garbage. Or, I provided understandable English audio, sports-related, and it returned this response as a transcription:
“This is an English audio with the highest possible accuracy. This is an English audio with the highest possible accuracy. This is an English audio with the highest possible accuracy.”

Does anyone have an idea what the problem could be?

Two additional questions:

  1. Would providing a language of the audio and the file make transcription better? The documentation says it should make some difference, but I haven’t noticed.
  2. Is anyone supplying ‘temperature’ with the prompt? Could changing the temperature help with the problems I am having above?
1 Like