Word level timestamps and sentence timestamps together?

info493 · March 5, 2024, 8:52am

Is it possible to get both word-level and sentence timestamps returned together in the verbose_json response - using only one API call?

PaulBellow · March 5, 2024, 8:59am

Welcome to the forum.

Can you explain more what you mean here?

What are you trying to do exactly?

info493 · March 5, 2024, 9:12am

I’m using the API to generate timestamped json. It works fine on sentence-level and word-level seperately.

However, I want to use one API call to get combined word-level and sentence-level outputted to the JSON response.

To illustrate:

Word-level:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="word" -F model="whisper-1" -F language="nl"'

Sentence-level:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="sentence" -F model="whisper-1" -F language="nl"'

_j · March 5, 2024, 2:12pm

It seems not. It’s a switch. However, sentences seem relatively easy to obtain programmatically just by looking at punctuation and capturing the boundary of word timestamps within.

KubaK · March 6, 2024, 7:16pm

Yes, it is now possible to get both word-level and segment-level timestamps from Whisper in one API call.

Try repeating the “-F timestamp_granularities[]” part of the command like this:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="word" -F timestamp_granularities[]="segment" -F model="whisper-1" -F language="nl"'

I’m not sure if the above will work but I know for a fact that this curl command works:

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@YOUR_FILE_PATH" \
  -F "timestamp_granularities[]=word" \
  -F "timestamp_granularities[]=segment" \
  -F model="whisper-1" \
  -F response_format="verbose_json"

Also this works for Python:

from openai import OpenAI
client = OpenAI(api_key='YOUR_OPENAI_API_KEY')

audio_file = open('YOUR_FILE_PATH', 'rb')
transcript = client.audio.transcriptions.create(
  file=audio_file,
  model="whisper-1",
  response_format="verbose_json",
  timestamp_granularities=["segment", "word"]
)

print(transcript)

And here is a workaround for using http.MultipartRequest in Flutter:

var url = Uri.https("api.openai.com", "v1/audio/transcriptions");
var request = http.MultipartRequest('POST', url);
request.headers.addAll(({"Authorization": "Bearer YOUR_OPENAI_API_KEY"}));
request.fields["model"] = 'whisper-1';
request.fields["response_format"] = 'verbose_json';

List<String> timestampGranularities = ['word', 'segment'];
for (String granularity in timestampGranularities) {
  request.files.add(http.MultipartFile.fromString(
    'timestamp_granularities[]', granularity));
}
request.files.add((await http.MultipartFile.fromPath('file', 'YOUR_FILE_PATH')));

var response = await request.send();
var newresponse = await http.Response.fromStream(response);

Hope this helps anyone
Sorry if it doesn’t - I’m still learning.

Topic		Replies	Views
Word level transcription data? Bugs	2	938	February 28, 2024
How to get Whisper's API to add timestamps to the transcripts? API api , whisper	5	16989	January 29, 2024
Whisper API: a) Timecodes; b) how good is open-source vs API? API whisper	9	6393	July 28, 2023
How can I get word_timestamp? API whisper	1	3282	December 14, 2023
Discrepancy in segment level vs word level time stamps with whisper API API	0	928	May 4, 2024

Word level timestamps *and* sentence timestamps together?

Related topics

Word level timestamps and sentence timestamps together?