Word level timestamps *and* sentence timestamps together?

Is it possible to get both word-level and sentence timestamps returned together in the verbose_json response - using only one API call?

Welcome to the forum.

Can you explain more what you mean here?

What are you trying to do exactly?

I’m using the API to generate timestamped json. It works fine on sentence-level and word-level seperately.

However, I want to use one API call to get combined word-level and sentence-level outputted to the JSON response.

To illustrate:

Word-level:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="word" -F model="whisper-1" -F language="nl"'

Sentence-level:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="sentence" -F model="whisper-1" -F language="nl"'
1 Like

It seems not. It’s a switch. However, sentences seem relatively easy to obtain programmatically just by looking at punctuation and capturing the boundary of word timestamps within.

Yes, it is now possible to get both word-level and segment-level timestamps from Whisper in one API call.

Try repeating the “-F timestamp_granularities[]” part of the command like this:

'curl --request POST \'https://api.openai.com/v1/audio/transcriptions\' --header \'Authorization: Bearer ' . $token . '\' --header \'Content-Type: multipart/form-data\' -F file="@audio.mp3" -F response_format="verbose_json" -F timestamp_granularities[]="word" -F timestamp_granularities[]="segment" -F model="whisper-1" -F language="nl"'

I’m not sure if the above will work but I know for a fact that this curl command works:

curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@YOUR_FILE_PATH" \
  -F "timestamp_granularities[]=word" \
  -F "timestamp_granularities[]=segment" \
  -F model="whisper-1" \
  -F response_format="verbose_json"

Also this works for Python:

from openai import OpenAI
client = OpenAI(api_key='YOUR_OPENAI_API_KEY')

audio_file = open('YOUR_FILE_PATH', 'rb')
transcript = client.audio.transcriptions.create(
  file=audio_file,
  model="whisper-1",
  response_format="verbose_json",
  timestamp_granularities=["segment", "word"]
)

print(transcript)

And here is a workaround for using http.MultipartRequest in Flutter:

var url = Uri.https("api.openai.com", "v1/audio/transcriptions");
var request = http.MultipartRequest('POST', url);
request.headers.addAll(({"Authorization": "Bearer YOUR_OPENAI_API_KEY"}));
request.fields["model"] = 'whisper-1';
request.fields["response_format"] = 'verbose_json';

List<String> timestampGranularities = ['word', 'segment'];
for (String granularity in timestampGranularities) {
  request.files.add(http.MultipartFile.fromString(
    'timestamp_granularities[]', granularity));
}
request.files.add((await http.MultipartFile.fromPath('file', 'YOUR_FILE_PATH')));

var response = await request.send();
var newresponse = await http.Response.fromStream(response);

Hope this helps anyone :slight_smile:
Sorry if it doesn’t - I’m still learning.

1 Like