Invalid 'input_audio_transcription.prompt': string too long

jefry · April 10, 2025, 9:19am

Why the prompt limit for gpt-4o-transcribe realtime api is only 1024 ?
the error is thrown when I create a transcription sessions /v1/realtime/transcription_sessions

  "input_audio_transcription": {
    "model": "gpt-4o-transcribe",
    "language": null,
    "prompt": ""
  },

I can only put a 1024 chars string, much lower than 16,000 context window supported by gpt-4o-transcribe

any reason why ?

_j · April 10, 2025, 3:16pm

The reason why is to protect you - from yourself.

A prompt is not a place for commands or behaviors.

It is for the whisper series of models, and is meant to be a lead-up text that is not reproduced, but is the immediate transcript before. This gives contextual information that enhances the output text production upon continuation.

I expect that it is similarly containered with the purpose of some text stated when given to gpt-4o. And thus doesn’t work reliably - the same way that you say “continue this” to the model, 50/50 chance it doesn’t continue.

jefry · April 11, 2025, 1:03am

@_j thank you for the reply.
My next question is, do realtime gpt-4o-transcribe re-use existing audio or transcription result? from my test looks like it is not? I’ve cases where it transcript the audio to a totally different language

_j · April 11, 2025, 1:21am

Every API call you make will be independent, a new instance, stateless. Were it not, imagine the confusion when I send 100 API calls in parallel.

Even in producing a “chat” with an entity, to make every question and every interaction not appear to be the fresh start it is, we must sent back previous turns of conversation as a pattern that appears to be continuing.

Thus prompt for audio-to-speech models is just that: a preloading of observed speech-as-text that the AI continues it’s output upon.

Multimodal gpt-4o is different in its operation: it observes an entire context window that has been placed, with an attention mechanism, and with intelligence, completes, especially when prompted or fine-tuned. So although this model is undocumented except by name, we can assume it similar to “you repeat this back, but as generated text”, with confusion still on the table.

Topic		Replies	Views
OpenAI Realtime API / transcription : Maximum prompt length of input_audio_transcription? API transcribe , realtime , api-realtime	1	356	August 8, 2025
Issues with GPT-4o-transcribe API API realtime	15	3423	April 1, 2026
Gpt-4o-transcribe outputs content from prompt instruction for small/silent audio samples API transcribe , realtime , gpt-4o-transcribe	0	174	November 24, 2025
Gpt-4o-transcribe truncates the transcript API transcribe	15	3032	August 29, 2025
GPT-4o-transcribe and audio model ready to use via API? API transcribe	10	4296	March 17, 2026

Invalid 'input_audio_transcription.prompt': string too long

Related topics