Invalid JSON returned from Audio/Whisper endpoints

When using the Transcriptions and Translations endpoints from the Python API (openai.audio.transcriptions.create and openai.audio.translations.create), with response_format set to “verbose_json” or “json”, it returns something that is not valid JSON. The output begins with a custom data type, ie. “Transcription”, and contains parentheses, unescaped characters, single quotes, and other components that cause it to be invalid JSON.

Ex:

response = openai.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file,
    response_format="json"
)
# response
# Transcription(text='¿Cómo podría ser el mundo transformado? ¿Cómo vendría su reino? ... hasta el final, hasta lo último.')

response = openai.audio.translations.create(
    model="whisper-1", 
    file=audio_file,
    response_format="json"
)
# response
# Translation(text='How could the world be transformed? How would his kingdom come? ... until the end, until the end.')

Note: the text field has been abbreviated with “…” for space in the outputs.

Test was done with the following:

  • a Jupyter notebook
  • Python 3.12.1
  • OpenAI Python SDK, version 1.30.1

I had the same issue. Surprised that this issue hasn’t been resolved for more than a month.

Has anybody heard if/when this will be fixed? Feels a bit weird having inconsistent response payload formats across APIs.