Recurring `TypeError: 'tuple' object has no attribute 'lower'` from `gpt-4o` / `gpt-4o-mini` transcription API (Streaming)

Hey everyone,

Getting a persistent TypeError: 'tuple' object has no attribute 'lower' from the gpt-4o-transcribe and gpt-4o-mini-transcribe streaming APIs.

Issue: Sometimes the transcription API returns a tuple instead of a string. When our Python app tries to use string methods (.lower(), .strip()) on it, it crashes. Error message points to the OpenAI streaming API.

Is this a known bug or expected behavior? Any insights on why the API would return a tuple instead of a string, or how to reliably handle this?

Thanks.

The model cannot technically return a ā€œtupleā€ which is a Python convention. It streams JSON objects.

You don’t tell us if using the SDK or how you wrote a stream parser.

I just ran a transcription. Here’s the end of the raw stream.

...
data: {"type":"transcript.text.delta","delta":" Arabs"}

data: {"type":"transcript.text.delta","delta":"Welcome"}

data: {"type":"transcript.text.delta","delta":" to"}

data: {"type":"transcript.text.delta","delta":" our"}

data: {"type":"transcript.text.delta","delta":" radio"}

data: {"type":"transcript.text.delta","delta":" show"}

data: {"type":"transcript.text.delta","delta":"."}

data: {"type":"transcript.text.done","text":"In order to be able to talk we just have to agree that we're talking roughly about the same thing. And I know that you know\nAs much about time as I need you to know\nWe got here on time, and you know what that means.\nWelcome to our radio show.\nAnother subtlety involved was already mentioned.\nWelcome to our radio show.\nWelcome to our radio show.\nWelcome to our radio show.\nSo that doesn't work either, and that's another subtlety that we'll have to get around in quantum mechanics.\nBut as we are going to do, we first learn to see what the problems are before the complications, and then we'll be in a better position to correct it for the more recent knowledge on the subject.\nSo we'll take a simple point of view about time and space, you know what it means in a rough way.\nWelcome to our radio show.\nSection 8.2\nSpeed\nNevertheless, there are still some subtleties.\nWelcome to our radio show.\nWell, they could do this all right.\nWelcome to our radio show.\nWelcome to our radio show.\nThe Greeks got very confused about this, and a new branch of mathematics had to be discovered beyond that.\nGeometry and algebra of the Greeks and Arabs\nWelcome to our radio show.","usage":{"type":"tokens","total_tokens":1479,"input_tokens":1184,"input_token_details":{"text_tokens":132,"audio_tokens":1052},"output_tokens":295}}

It also shows that the gpt-4o-transcribe model continues to malfunction to where it cannot be used over reliable whisper-1, here repeating the input ā€œpromptā€ text about the transcript being a radio show multiple times instead of the audio chunk with 'chunking_strategy': (None, "auto"), used (None there being no filename for the parameter part), when instead prompt should be indicating to the model the lead-up text before the latest audio and never be repeated.

I would suggest similarly logging RESTful requests instead of using the OpenAI SDK, and you can see if your particular input throws out JSON gone goofy, with unexpected events. Or if you are indeed doing your own code, find how you are misinterpreting the sent data.

The Fix:
The solution is to separate the form data (the model name) from the file upload. You do this by creating a data dictionary for simple values and keeping the files dictionary only for the file.

This is the new, correct code:

Generated python

CORRECTED CODE

data = {
ā€˜model’: ā€˜gpt-4o-mini-transcribe’ # The model name is now form data
}
files = {
ā€˜file’: (os.path.basename(file_path), audio_file, ā€˜audio/mpeg’) # The files dict only contains the file
}

The request now correctly sends BOTH data and files

response = requests.post(OPENAI_WHISPER_URL, headers=headers, data=data, files=files, timeout=60)

I used Google Gemini for the fix. The tuple error had been been driving me nuts for weeks. Little disappointed in the 4o transcribe model, the model truncates replies where as the whisper model does not, The whisper and 4o transcribe models are super cheap to use so I shouldn’t be complaining. Super cool to use these models.

Thanks for taking the time to reply.

Wade

1 Like

This past post of mine shows how if using requests (or the httpx library as a http/2 drop-in), you can use a single ā€œfilesā€ parameter, BTW.

I don’t demonstrate ā€œstreamā€ and an iterator there - but you can figure it out (and not use chunking strategy).