Word level timestamps from whisper v3's json is invalid

drewocarr · February 20, 2024, 2:53am

Playing with the new word level timestamps in large-v3, however for some reason the json returns with single quotes instead of double quotes. Running the json through chatGPT easily converts it but its pretty risky if it decides to hallucinate any of the output.

example

[{‘text’: ’ Abstract.‘, ‘timestamp’: (0.0, 0.54)}, {‘text’: ’ The’, ‘timestamp’: (0.54, 1.48)}, {‘text’: ’ ubiquity’, ‘timestamp’: (1.48, 2.04)}, {‘text’: ’ of’, ‘timestamp’: (2.04, 2.2)}]

_j · February 20, 2024, 3:49am

It seems you are describing Python’s adaptive reporting on the contents of string objects. It depends on how you got there and the contents whether you see single or double quotes, or escaped quotes of either type.

The bytes returned from a direct “requests” library call to the API is JSON:

"words": [\n {\n "word": "This",\n "start": 1.059999942779541,\n "end": 1.600000023841858\n },\n {\n "word": "is",\n "start": 1.600000023841858,\n "end": 1.7799999713897705\n },\n {\n "word": "a",\n "start": 1.7799999713897705,\n "end": 1.9800000190734863\n },\n {\n "word": "radio",\n "start": 1.9800000190734863,\n "end": 2.380000114440918\n },\n {\n "word": "show",\n "start": 2.380000114440918,\n "end": 2.619999885559082\n },\n {\n "word": "where",\n "start": 2.619999885559082,\n "end": 2.859999895095825\n },\n {\n "word": "people",\n "start": 2.859999895095825,\n "end": 3.140000104904175\n },\n {\n "word": "call",\n "start": 3.140000104904175,\n "end": 3.440000057220459\n },\n {\n "word": "us",\n "start": 3.440000057220459,\n "end": 3.640000104904175\n },\n {\n "word": "and",\n "start": 3.640000104904175,\n "end": 3.819999933242798\n },\n {\n "word": "ask",\n "start": 3.819999933242798,\n

drewocarr · February 20, 2024, 4:10am

Just using the sample code they have on the model card

result = pipe(sample, return_timestamps=“word”)
print(result[“chunks”])

_j · February 20, 2024, 4:30am

Set a data object with a mix of strings with double quotes and escaped single quotes:
chunks = [{'text': 'He said "Hello"', 'timestamp': (0.0, 0.54)}, {'text': ' because', 'timestamp': (0.54, 1.48)}, {'text': ' it\'s', 'timestamp': (1.48, 2.04)}, {'text': '"polite"', 'timestamp': (2.04, 2.2)}]
Print:
print(chunks)
See the alternation of string enclosure for ideal presentation of the contents of any one string:
[{'text': 'He said "Hello"', 'timestamp': (0.0, 0.54)}, {'text': ' because', 'timestamp': (0.54, 1.48)}, {'text': " it's", 'timestamp': (1.48, 2.04)}, {'text': ' "polite"', 'timestamp': (2.04, 2.2)}]

Or we make a JSON string, that is no longer the list and dictionary structure references you’d use for parsing.

import json
print(json.dumps(chunks, indent=2))

[
  {
    "text": "He said \"Hello\"",
    "timestamp": [
      0.0,
      0.54
    ]
  },
  {
    "text": " because",
    "timestamp": [
      0.54,
      1.48
    ]
  },
  {
    "text": " it's",
    "timestamp": [
      1.48,
      2.04
    ]
  },
  {
    "text": " \"polite\"",
    "timestamp": [
      2.04,
      2.2
    ]
  }
]

(Along with enclosing it within ``` here, a good way to present information on the forum)

drewocarr · February 20, 2024, 6:36pm

Ah gotcha, makes perfect sense, thank you!

Topic		Replies	Views
Returning an incorrect json response with single quotes in content API	12	26860	December 23, 2023
Word level transcription data? Bugs	2	1143	February 28, 2024
Invalid JSON returned from Audio/Whisper endpoints Bugs whisper , audio	2	436	August 18, 2024
Quotation marks in API response breaking follow-up responses API	6	4461	December 18, 2023
Discrepancy in segment level vs word level time stamps with whisper API API	0	1099	May 4, 2024

Word level timestamps from whisper v3's json is invalid

Related topics