Processing of model outputs while fine-tuning Whisper

aslanova.fira · November 28, 2024, 5:47pm

Hey everyone! I have a quick theoretical question about fine-tuning Whisper models on own labelled data. Following the general Colab notebooks explained by many developers, there seems to be no preprocessing steps for transcriptions that are generated during the training and are used for evaluations over the course of training. Is it necessary to add this step in the training somehow?
What I mean:
For example, in my own dataset that I am using for fine-tuning, all letters are lowercase, and there are no punctuation marks, with only letter characters and whitespaces. Now, after starting fine-tuning, for example, every 200 steps the current model gets evaluated, however, is it not possible that model will generate output that will have uppercase and punctuation characters (as Whisper’s pretrained models, f.e., large v3 do that), and therefore WER will be higher than if they were preprocessed after being generated (remove all the characters that are removed from fine-tuning dataset)?

shafique1 · November 29, 2024, 4:05am

It’s not mandatory to preprocess generated transcriptions during training, but aligning them with your dataset’s format (e.g., lowercase, no punctuation) can improve evaluation accuracy. This ensures a fair WER comparison by matching the fine-tuning dataset’s style. You can add a post-processing step for generated outputs if needed.

Topic		Replies	Views
Whisper: how do I make the model output punctuation as punctuation, rather than transcribing the words? API	2	5706	March 6, 2024
Speech to Text (ASR) Strategy Community whisper , audio , gpt-4o-audio-preview	8	458	March 10, 2025
Preprocessing for embeddings API	4	5549	December 17, 2023
Troubleshooting OpenAI's Whisper Model: Resolving Incorrect Language Outputs for Maithili with Multilanguage Tokenizer Community whisper	1	161	September 18, 2024
Whisper's auto-punctuation Prompting whisper	6	2717	June 8, 2024

Processing of model outputs while fine-tuning Whisper

Related topics