Whisper Segment Start Times

This is code I’m using to transcribe audio. For some reason the first segment received is always at 0.00 but the rest are on time. Is that how it’s supposed to work?

  # Transcribe the audio using OpenAI's Whisper API
  with open(audio_path, "rb") as audio_file:
      transcript = openai.audio.transcriptions.create(
          file=audio_file,
          model="whisper-1",
          response_format="verbose_json",
          timestamp_granularities=["segment"],
      )  

I have the exact same issue. This causes subtitles to show before the speaker starts speaking.

Strangely, if you include "words" in your timestamp_granularities the first segment does start at the right time!

It would certainly make more sense if the timestamp is always correct.