Word level transcription data?

perfectlyhuman · February 28, 2024, 6:58pm

I’m following the documentation for the v1/audio/transcriptions endpoint, my request looks like this:

with open(audio_file_path, “rb”) as audio_file:
files = {
“file”: audio_file,
}
data = {
“model”: “whisper-1”,
“response_format”: “verbose_json”,
“timestamp_granularities”: [“word”],
}

    response = requests.post(url, headers=headers, files=files, data=data)

but I’m only getting segment level transcription data rather than word.

Is word level timestamp_granularities still possible with whisper-1?

_j · February 28, 2024, 7:25pm

Sure. Not still, but rather new.

Some code to send your file and get timed words.

import os
import requests

# Gets the API key from environment variable
api_key = os.getenv("OPENAI_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}
url = "https://api.openai.com/v1/audio/transcriptions"

with open("audio.mp3", "rb") as audio_file:
    parameters = {
        "file": ("audio.mp3", audio_file),
        "language": (None, "en"),
        "model": (None, "whisper-1"),
        "prompt": (None, "Here is the radio show."),
        "response_format": (None, "verbose_json"),
        "temperature": (None, "0.1"),
        "timestamp_granularities[]" : (None, "word"),
    }
    response = requests.post(url, headers=headers, files=parameters)

if response.status_code != 200:
    print(f"HTTP error {response.status_code}: {response.text}")
else:
    # Get the transcribed text and timed words from the response
    transcribed_text = response.json()['text']
    words = response.json()['words']
    formatted_words = [
        {k: f"{v:.2f}" if isinstance(v, float) else v for k, v in word.items()}
        for word in words
    ]
    # Save text or words to a file
    try:
        with open("transcript.txt", "w") as file:
            #file.write(transcribed_text)
            file.write(str(formatted_words))
        print("Transcribed text successfully saved to 'transcript.txt'.")
    except Exception as e:
        print(f"output file error: {e}")

    print(formatted_words[:20])

It’s going to keep saving to the same file if you don’t do some more coding. Then you get to decide what to do with the output, or just enjoy the printed start:

Transcribed text successfully saved to 'transcript.txt'.
[{'word': 'This', 'start': '1.04', 'end': '1.60'}, {'word': 'is', 'start': '1.60', 'end': '1.78'}, {'word': 'a', 'start': '1.78', 'end': '1.98'}, {'word': 'radio', 'start': '1.98', 'end': '2.38'}, {'word': 'show', 'start': '2.38', 'end': '2.60'}, {'word': 'where', 'start': '2.60', 'end': '2.86'}, {'word': 'people', 'start': '2.86', 'end': '3.14'}, {'word': 'call', 'start': '3.14', 'end': '3.44'}, {'word': 'us', 'start': '3.44', 'end': '3.64'}, {'word': 'and', 'start': '3.64', 'end': '3.82'}, {'word': 'ask', 'start': '3.82', 'end': '3.98'}, {'word': 'us', 'start': '3.98', 'end': '4.24'}, {'word': 'questions', 'start': '4.24', 'end': '4.52'}, {'word': 'about', 'start': '4.52', 'end': '4.82'}, {'word': 'cars', 'start': '4.82', 'end': '5.14'}, {'word': 'right', 'start': '5.14', 'end': '5.44'}, {'word': 'And', 'start': '5.56', 'end': '5.96'}, {'word': 'what', 'start': '5.96', 'end': '6.16'}, {'word': 'were', 'start': '6.16', 'end': '6.30'}, {'word': 'we', 'start': '6.30', 'end': '6.48'}]

perfectlyhuman · February 28, 2024, 7:34pm

turns out I was missing the brackets on the timestamp_granularities parameter:
“timestamp_granularities”

would be nice if their documentation showed that!

example request in python in documentation shows it this way:

from openai import OpenAI
client = OpenAI()

audio_file = open(“speech.mp3”, “rb”)
transcript = client.audio.transcriptions.create(
file=audio_file,
model=“whisper-1”,
response_format=“verbose_json”,
timestamp_granularities=[“segment”]
)

print(transcript.words)

Topic		Replies	Views
Discrepancy in segment level vs word level time stamps with whisper API API	0	935	May 4, 2024
Word level timestamps and sentence timestamps together? API	4	1863	March 6, 2024
Unable to get word level timestamp from AzureOpenAI client, whisper-1 Bugs whisper , azure-openai	0	38	June 20, 2025
How can I get word_timestamp? API whisper	1	3287	December 14, 2023
Error when adding 'timestamp_granularities' to Whisper API Bugs whisper	2	399	October 22, 2024

Word level transcription data?

Related topics