GPT-4o-transcribe truncation issue after the diarize API update

Hello everyone,

Since the update to gpt-4o-transcribe that introduced the -diarise endpoint, the regular (gpt-4o-transcribe) model’s behaviour has changed, altering its performance.

In text-to-speech mode from an audio file (10 seconds to 10 minutes), gpt-4o-transcribe now tends to truncate the end of the transcription if there is a pause in speech — something that never happened before with the same code.

                transcript = client.audio.transcriptions.create(
                    model="gpt-4o-transcribe",
                    file=audio_file,
                    language=lang,
                    prompt=med_prompt,
                    response_format='text'
                )

When adding chunking_strategy= "auto":

Truncation no longer occurs, but the model tends to hallucinate content in segmented chunks containing only background noise, and overall recognition quality appears slightly degraded.

I tried to make it process the audio in one piece defining a custom server VAD the chunking, but it doesn’t seem to be properly handled by the API:


                    chunking_strategy={
                        "type": "server_vad",
                        "threshold": 0.5,
                        "prefix_padding_ms": 200, 
                        "silence_duration_ms": 10000,
                    },

In summary:

  • Without chunking_strategy, truncations now occurs.

  • With chunking_strategy: 'auto', hallucinations appear.

  • Custom chunking_strategy does not seem to work.

Both are significant issues for a speech-to-text model; it previously worked very well.

Thanks for your help!