Getting no response event for input_audio_transcription in realtime ws

Even though i have set

"input_audio_transcription": {
      "model": "whisper-1"
    },

in session.update , am not getting any events back from openai like: conversation.item.input_audio_transcription.completed and conversation.item.input_audio_transcription.failed

Getting this error

Received event: {
  "type": "response.done",
  "event_id": "event_AJjdTfvjtCvTmv1IqUANH",
  "response": {
    "object": "realtime.response",
    "id": "resp_AJjdT63D2ZLvew6yqq16W",
    "status": "failed",
    "status_details": {
      "type": "failed",
      "error": {
        "type": "server_error",
        "code": null,
        "message": "The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the session ID sess_AJjdMwqXnwRohafrM9VPo in your message.)"
      }
    },
    "output": [],
    "usage": {
      "total_tokens": 0,
      "input_tokens": 0,
      "output_tokens": 0,
      "input_token_details": {
        "cached_tokens": 0,
        "text_tokens": 0,
        "audio_tokens": 0
      },
      "output_token_details": {
        "text_tokens": 0,
        "audio_tokens": 0
      }
    }
  }
}
Response completed: resp_AJjdT63D2ZLvew6yqq16W

Update: I am getting conversation.item.input_audio_transcription.failed error

User transcription failed from OpenAI: {
  type: 'server_error',
  code: null,
  message: "Input transcription failed for item 'item_AKRNyONV1jVsyUpi3I05E'.",
  param: null
}
4 Likes

I have the same problem. Even in playground and test app from openai.

type:"conversation.item.input_audio_transcription.failed"
event_id:"event_AKoGWufao75TYukE7hQsf"
item_id:"item_AKoGT6tHPUsFUJo64bdVg"
content_index:0
error.type:"server_error"
error.code:null
error.message:"Input transcription failed for item 'item_AKoGT6tHPUsFUJo64bdVg'."
error.param:null

I also got this problem, few hours ago there was no problem. I’m not getting any “conversation.item.input_audio_transcription.completed” or “conversation.item.input_audio_transcription.failed” events at all.

Same problem here. Following.

“type”: “conversation.item.input_audio_transcription.failed”,
“event_id”: “event_AMb6Me1caYITrW451vlPw”,
“item_id”: “item_AMb6M2IIEfcvebQig3GHl”,
“content_index”: 0,
“error”: {
“type”: “server_error”,
“code”: null,
“message”: “Input transcription failed for item ‘item_AMb6M2IIEfcvebQig3GHl’. 429 Too Many Requests”,
“param”: null
}

Check Credit Balance as in attached : (if its negative,you will get above error message:

2 Likes

Could also be an audio input format problem, either way it would be useful to see your config/setup in order to make some guesses at what’s going wrong (if it’s not a rate limit or billing issue)

Also not getting any “conversation.item.input_audio_transcription.completed” events with the transcript of the input audio. Are there specific configs that should be passed in when setting up the connection?

You are right. This was the same issue with me and this was because of credit were less than 0.

Were you able to solve this?

Did you solve this dvir1?

facing this now where it was returning this event just a few days ago, with no changes to my api calls, anyone found a solution?

if youre not receiving completion events or getting a null for transcription even though it looks like u succeeded, it is because you actually haven’t properly configured the initial POST request where you handshake for ephemeral token

in this post im detailing it a bit:

Probably the exact same issue you guys are having.

Here is a working implementation from me in kotlin, I can convert it to python if anyone wants:

but its easy to read, like a pseudocode.

this is what worked for me:

use “gpt-4o-transcribe” and make sure you go to your settings and enable “gpt-4o-transcribe” as an allowed model for your project and api key.

Another thing to try is to print out all the event types like so

            etype = evt.get("type")

            if etype == "response.audio.delta":
                self.output_queue.put(base64.b64decode(evt["delta"]))
                self.last_recv = datetime.now()
            elif etype == "conversation.item.input_audio_transcription.failed":
                logging.error(f"failed transcription. details: {json.dumps(evt)}")

I realized i was getting a wave of conversation.item.input_audio_transcription.failed events back and the details didn’t seem to help much. After a bit of head scratching, suddenly remembered that this could be an access issue. Allowed the transcribe model for my project, and presto … my transcriptions are home!

sorry my link to github was outdated, I posted it again.

It’s a complete implementation in Kotlin script, you can convert it to your own language of preference: