Return identified language in Whisper API response

I’m using ChatGPT API + Whisper ( Telegram: Contact @marcbot ) to transcribe a user’s request and send that to ChatGPT for a response. Being able to interact through voice is quite a magical experience.

I also use speech synthesis to turn ChatGPT’s response back into voice. For this I’d like to know which language the user is speaking, as that’s likely the language ChatGPT’s output is in.

Whisper does a great job transcribing many languages. It would be great if the API response would also include the language it identified. I assume this is something the model is aware of? Having this in the response would allow me to choose the right speech synthesis model.

Turns out setting response_format to "verbose_json" will return back the language!

It’s unclear to me however which ISO standard (if any) this is using? For example it returned "dutch" for one of my requests. I would expect a ISO 639-1 code instead.

3 Likes

I have a similar project, using Whisper api for speech to text, then feeding GPT and finally passing the response to TTS engines. Using language detection from Whisper api sort of works to playback answer using TTS on same language as the original question, however it fails in some cases when user asks GPT to translate something or just questioning if it can reply on some different language, it will just immediately reply in that other language and wrong TTS will be used on these cases :frowning: It would only work really fine if we get a way to identify what language is the answer from the GPT api itself… is there any way to suggest that to OpenAI? Or maybe there is some special way to ask that on a system role prompt to include some kind of tag with language on it’s answers? I sort of succeed in getting something like that with system role but it’s not consistent on all replies…

This is something I could do by complete trial and error (after reading a sample on how to ask gpt to make citations - completely unrelated thing, but I did fiddle with the sample and wording):

{“role”: “system”, “content”: “Include this at start of your answers: ({\“lang\”: _}).”}

Question: Can you speak in Portuguese?

Response: ({“lang”: pt}) Sim, eu posso falar em português. Em que posso ajudar?

However I’m still not sure it is consistent to all replies because for example if you don’t explicitly ask it “…at start of…” then I could see it working for some questions and going mad with some other questions - outputting it all as a json object with some weird arrangement…

Also instead of ({“lang”: _}) if I ask for ({“language”: _}) it outputs full language name instead of two letters code like ({“lang”: “Portuguese”})

Well. I feel that it’s possible to get this done and I’n apparently close, but I’m just not sure of the correct way/wording to ask it to get consistent results on all responses… how can I get help on this?

It would be much better and easier though if OpenAI adds a language parameter on GPT api response similar to the one implemented on Whisper api…

Edit: Seems I’m getting more consistent results with this:
“You should include a tag at start of your answers in this format: [[[lang=xx]]]”

1 Like