I’ve been using whisper-1 for my audio process and it’s been fine so far, today I tried to switch and test the gpt-4o-transcribe, I know it has stricter output format, but input-wise, the api doc said it accept .wav, which is what i’ve been using smoothly with whisper. When I switch to the 4o-transcribe model, I keep getting :ERROR:main:Audio processing error: Error code: 400 - {‘error’: {‘message’: ‘This model does not support the format you provided.’, ‘type’: ‘invalid_request_error’, ‘param’: ‘messages’, ‘code’: ‘unsupported_format’}}
Here’s my code:
def process_audio(contents):
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp:
temp.write(contents)
temp_path = temp.name # Store the file path for later use
# Reopen the file in binary mode and pass it to the API
with open(temp_path, "rb") as temp_file:
transcript = client.audio.transcriptions.create(
model="gpt-4o-transcribe", #whereas before it was whisper-1
file=temp_file,
response_format="json"
)
# Clean up temporary file
os.unlink(temp_path)
return transcript
except Exception as e:
logger.error(f"Audio processing error: {e}")
raise