Whisper: how do I make the model output punctuation as punctuation, rather than transcribing the words?

ldorigo · March 6, 2024, 4:10pm

I’d like it to behave more similar to e.g. google VTT - if I say, e.g. “This is the list colon newline dash First item newline dash second item exclamation mark”, I want it to output:

“”"
This is the list:

First item
Second item!
“”“”

Instead, it faithfully outputs “This is the list colon newline dash First item newline dash second item exclamation mark”

Tried playing with the initial prompt with limited success, since it’s not instruction-tuned, I have a hard time giving it an example that shows it what I want it to do…

Dus-DB · March 6, 2024, 5:12pm

AFAIK Whisper “listens” an audio that should put in words (transcribe) if they are understandable as per its training.
So, as you said if its not instruction-tuned/trained then it can not follow specific instructions.
A workaround would be to pass the output from whisper to a LLM that can much better process a text based on instructions you give.
My case: Using Whisper I transcribed a tutorial video in which its explained how to use an ERP, and in the part where tutor mention the menu path (he just lists the menu/submenus while showing in video) whisper just transcribe is as a list of words… but when i send to GPT4 to create a FAQ from it, even without specific instructions, GPT4 “understood” that text was about explaining the menu/submenus selections and wrote adequately in its output.

Now, based on your post, I tested this on a LLM:
i have this text transcribed from audio:
‘This is the list colon newline dash First item newline dash second item exclamation mark.’
but i need it to be written using the corresponding punctuation mentioned there and with adequate format.

the output from LLM was this:
Certainly! Here’s the text transcribed from audio with the corresponding punctuation and proper formatting:
"This is the list:

First item
Second item!"

I have used a colon after the word “list” to indicate that a list is about to follow. Each item in the list is preceded by a dash, and a newline is used to separate each item. Finally, I added an exclamation mark at the end of the sentence to match the “exclamation mark” mentioned in the audio transcription.

vb · March 6, 2024, 5:16pm

This case is also discussed, shortly, in the docs

Sometimes the model might skip punctuation in the transcript. You can avoid this by using a simple prompt that includes punctuation: “Hello, welcome to my lecture.”

Since you have a unique use case adapting this may be necessary.

https://platform.openai.com/docs/guides/speech-to-text/improving-reliability

Topic		Replies	Views
Prompting in Whisper to INCLUDE punctuation in transcription Prompting whisper	4	2041	July 6, 2024
Whisper's auto-punctuation Prompting whisper	6	2136	June 8, 2024
Can whisper be prompted with a previous transcript? Prompting whisper , prompt-engineering	10	2655	July 9, 2023
How to avoid Hallucinations in Whisper transcriptions? API whisper	32	21684	May 1, 2025
Speech to Text (ASR) Strategy Community whisper , audio , gpt-4o-audio-preview	8	253	March 10, 2025

Whisper: how do I make the model output punctuation as punctuation, rather than transcribing the words?

Related topics