Transcription data returned by Whisper API has intersecting start and end timestamps for some words

When granularity is set to “word” , the returned transcription data has odd start and end values for some words. Here’s an example:

{
 word: 'apple',
 start: 43.86000061035156,
 end: 43.86000061035156 
}

It’s very strange that start and end timestamps are exactly the same.

It happens consistently. I have tried changing temperature and audio speed. Neither helps.