Symbols are taken as plain text in whisper api output

I am using whisper api to convert speech to text. But when I try to provide email id or contact number the symbols like @ or + symbol for country code while giving phone number was printing as plain english text.

For example the audio contains is My email is data@test. com and phone number is +123445.
After using whisper api the out text is showing My email is data at the right test. com and phone number is plus 123445.
How can we solve this problem so that it can show symbol rather than plain text in output?

You cannot.

Not directly anyway.

Whisper is a direct speech-to-text model, it transcribes what it hears, that’s it.

So, if you ask it to transcribe the Tommy Tutone classic, you’re likely to get something like,

86753 oh 9

Out of it.

It’s actually a pretty sophisticated mechanism for humans to be contextually aware enough to know when we hear “at” that we should understand “@” and when we hear “oh” we should understand “0.”

Your current best bet would be to do some post-processing on the text output.

I suggest some regular expressions to identify likely regions of interest then pass that bit to an LLM for correction.

Let’s take your example,

My email is data at the right test. com and phone number is plus 123445.

You might use a regex like,

[A-Za-z0-9._+~ -]+(@|at) ?[A-Za-z0-9. -]+?((\.|dot) ?[A-Za-z]{2,} ?)+

to capture any line with a potential email address.

In your example text, this captures,

My email is data at the right test. com

Now if you pass this to a GPT LLM with an appropriate system message you can ask it to correct the error,

https://platform.openai.com/playground/p/qMLuj8ku7jU0FocE0YHDCzif?model=gpt-3.5-turbo

system:

You are a transcription proofreader. Users will provide you with small snippets of text which have been generated by a speech-to-text program. Your job is to correct and normalize this text to convey the meaning intended by the speaker.

Be particularly mindful of potential errors in email addresses where something like “a@b.com” is likely to be transcribed as “a at b. com” or “a at b dot com” and numbers generally like “867-5309” when spoken may be transcribed as “86753 oh 9.”

user:

My email is data at the right test. com

assistant:

My email is data@therighttest.com.

Now, do be mindful, this regex and system message are almost certainly not optional, they’re just what I came up with laying next to my toddler waiting for them to fall asleep.

You would want to spend some time tuning it to your specific use-case, but I believe it’s enough to get you started and on your way.

Incidentally, here is the correction for your whole transcription example.

https://platform.openai.com/playground/p/7lP5zvwzjGeH2trPNeho3xoP?model=gpt-3.5-turbo

I recommend using regular expressions to focus your correction efforts and minimize your costs.