Whisper generating some random text after audio file ends

Whisper is generating some extra transcriptions (random) on each call with the same audio file.

Audio: drive.google .com/file/d/1I9m0KLzaIAH4C2Sg18jy3eWRud5bHX9j/view (remove the space before .com to get the link, since I couldn’t paste the link in the post)
Here are some of the transcriptions it generated for the same audio file.

"You went back to that day in 1942 because it was the day you purchased your first stock, which point out you were 11 years old then. Right. And the day I bought, I mean, the headlines were terrible every day. Within two months of this. This is the New York Times from 1942 that day. Yeah, the 1942 New York Times. And that day that I bought March 11th, the Dow Jones cracked 100 on the downside. There was a 2% decline, which would be 500 points. Before on March 10th, I said to my dad, I want to go all in. I had $125 and I put every bit of it in three shares of City Service Preferred. And as I pointed out to the crowd, I bought that stock at 38 and a quarter. It was down from 84 the year before and it was down from 55 in January. And I thought, I'm really buying it cheap. So it went down to 27 after I bought it. And then it went up to 200 later on, but I sold it 40. And then it dropped off to 3,000 traders, etc. Because nobody wanted X, the price was so perhaps to close to the floor proud. So, I was, you know, great. And when I was walking around in my section clothes and seeing people say, oh, that looks like some of my traders were just not awake and interested in it, because every time you go out, you get nicelyennen\n"
"You went back to that day in 1942 because it was the day you purchased your first stock, which point out you were 11 years old then. Right. And the day I bought, I mean, the headlines were terrible every day. Within two months of this. This is the New York Times from 1942 that day. Yeah, the 1942 New York Times. And that day that I bought March 11th, the Dow Jones cracked 100 on the downside. There was a 2% decline, which would be 500 points. Before on March 10th, I said to my dad, I want to go all in. I had $125 and I put every bit of it in three shares of City Service Preferred. And as I pointed out to the crowd, I bought that stock at 38 and a quarter. It was down from 84 the year before and it was down from 55 in January. And I thought, I'm really buying it cheap. So it went down to 27 after I bought it. And then it went up to 200 later on, but I sold it 40. So in times like these, you just have to be very quick. Not all stores, but if you do buy out, very quick. They, like I said, America's like all about, you know, trying to help each other out. Yeah. It's about, I think it's definitely, that's the messy at polls too, but there's that two by two thing, I think, that folks are really. Gotta love going out and buying stocks and, you know, I think it's fun. I'm going to go and do another buyout next year. If it's in January, it's a plumber. And if it's one, no worries.\n"
"You went back to that day in 1942 because it was the day you purchased your first stock, which point out you were 11 years old then. Right. And the day I bought, I mean, the headlines were terrible every day. Within two months of this. This is the New York Times from 1942 that day. Yeah, the 1942 New York Times. And that day that I bought March 11th, the Dow Jones cracked 100 on the downside. There was a 2% decline, which would be 500 points. Before on March 10th, I said to my dad, I want to go all in. I had $125 and I put every bit of it in three shares of City Service Preferred. And as I pointed out to the crowd, I bought that stock at 38 and a quarter. It was down from 84 the year before and it was down from 55 in January. And I thought, I'm really buying it cheap. So it went down to 27 after I bought it. And then it went up to 200 later on, but I sold it 40. Now, they're kind of on some of the stop loos because we're all just buying it at aVery Unistoriced level. You guys all know what they would say, right? You because people always like to say, it's a good idea to get your share before it's actually as bad. And so it was when 40 went up vs 33, I hit the ground running, as the Dow Jones hit its 100th percent in May after many recent double up highs by about $1.\n"

This is the audio file. It’s somewhere around 56 sec and the transcriptions should end at then it went up to 200 later on, but I sold it 40 but it’s generating some extended text.

Here’s the code I used:

import openai


def get_transcriptions():
    openai.api_key = "<API_KEY>"
    audio_file = open('tmp2.wav', "rb")
    transcript = openai.Audio.transcribe(
        "whisper-1", audio_file, language="en", response_format="text"
    )
    return transcript


transcripts = get_transcriptions()
print(transcripts)

Usually worth sanitising your audio such that lead in and out silence is truncated, the model is trying to fit frequency groups and silence as patterns with words, depending on the quality of the source the noise floor can become a wideband frequency source and introduce errors.

The video is 56 sec long but the predictions are going beyond that timestamp:

14
00:00:49,320 --> 00:00:51,320
So it went down to 27 after I bought it.

15
00:00:51,800 --> 00:00:56,040
And then it went up to 200 later on, but I sold it 40.

16
00:00:56,040 --> 00:01:01,840
And I bought a second half idea and its log my expected intended path pace shows

17
00:01:01,880 --> 00:01:06,640
that if it becomes a 70 dollar cap, I'm going to be successful with it my license.

18
00:01:06,640 --> 00:01:07,920
And I can't wait for it.

19
00:01:07,920 --> 00:01:09,240
It's that green ball over.

20
00:01:09,240 --> 00:01:13,160
And the fact that I ended up continuing to buy it at the rate of 75 on that stock is

21
00:01:13,160 --> 00:01:14,160
second to none.

22
00:01:14,200 --> 00:01:18,240
I have a huge list, you know, I've bought encontrar Forex prices.

23
00:01:18,760 --> 00:01:21,520
I've had a bunch of investors come in, I've sold them all.

24
00:01:21,760 --> 00:01:23,520
I have done projects like truble.

Until 56 sec it’s correct and there’s no silence after that. Even if there could be some silence at the end, wondering why is it generating different outputs after 56th second for the exact same audio file.

Interesting, it’s almost like the file is corrupted somehow, like it’s looping some parts.

that’s what I thought initially but the extra text it’s generating doesn’t seem to be present anywhere in the file.