Retrieving user response from Realtime Voice WebRTC

specialK45 · December 19, 2024, 3:18pm

Developing a chat bot with real time voice to voice using the new WebRTC.
When using voice to voice, which is the parameter that returns your transcribed text after your turn when you speak? it says the model defaults to whisper-1, when I try to access conversation.item.create or input_audio_transcription.completed it’s returning null in the transcript values

https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create

or

https://platform.openai.com/docs/api-reference/realtime-server-events/conversation/item/input_audio_transcription/completed

In my logs you can see the transcript parameter is null for conversation.item.create…

1. {type: 'conversation.item.created', event_id: 'event_AgD4grkMnnQAX5meG0uSX', previous_item_id: null, item: {…}}

  1. event_id: "event_AgD4grkMnnQAX5meG0uSX"
  2. item:

    1. content: Array(1)

      1. 0: {type: 'input_audio', transcript: null}
      2. length: 1
      3. [[Prototype]]: Array(0)

    2. id: "item_AgD4gCkEkQZuilVoJrVs2"
    3. object: "realtime.item"
    4. role: "user"
    5. status: "completed"
    6. type: "message"
    7. [[Prototype]]: Object

  3. previous_item_id: null
  4. type: "conversation.item.created"
  5. [[Prototype]]: Object

But in the docs

    "event_id": "event_1920",
    "type": "conversation.item.created",
    "previous_item_id": "msg_002",
    "item": {
        "id": "msg_003",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": "hello how are you",
                "audio": "base64encodedaudio=="
            }
        ]
    }
}

The transcript parameter is filled.

Does anyone have any idea why this value is coming back null?

P0mme · December 19, 2024, 7:21pm

Hi,

Have you updated your session to turn on audio input transcriptions? Because by default it is off :

specialK45 · December 19, 2024, 7:33pm

      "https://api.openai.com/v1/realtime/sessions",
      {
        method: "POST",
        headers: {
          Authorization: `Bearer ${process.env.OPEN_AI_API_KEY}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({
          model: "gpt-4o-realtime-preview-2024-12-17",
          voice: "alloy",
          modalities: ["audio", "text"],
          instructions: instructions,
          input_audio_transcription: {
            model: "whisper-1",
          },
          temperature: 1.1,
        }),
      }
    );
    const openAISessionData = await response.json();
    console.log("Session data received:", openAISessionData);
    console.log("OpenAI session created successfully");

    // Return ephemeral session info
    return res.status(200).json({
      success: true,
      sessionData: {
        id: openAISessionData.id,
        token: openAISessionData.client_secret.value,
        model: openAISessionData.model,
        object: openAISessionData.object,
        expires_at: openAISessionData.client_secret.expires_at,
        modalities: openAISessionData.modalities,
        url: "https://api.openai.com/v1/realtime",
        input_audio_transcription: openAISessionData.input_audio_transcription,
        turn_detection: openAISessionData.turn_detection,
        temperature: openAISessionData.temperature,
      },
    });

This is what I have, unless im just supposed to set input_audio_transcription to true

P0mme · December 19, 2024, 7:39pm

I honestly don’t know lol I have just read the docs to try and help you out but I haven’t implemented the realtime api myself yet

What happens if you try to set it to true in a session update message after you’ve created the session?

specialK45 · December 19, 2024, 7:40pm

its being set to true i think cuz i have that parameter added, but im never getting the correct event back

for more context here is the front end code


  // Update the handleWebRTCResponse function
  const handleWebRTCResponse = async (serverEvent) => {
    if (serverEvent.type === "response.done") {
      // Extract text from the transcript in the audio content
      const responseText =
        serverEvent.response.output[0]?.content[0]?.transcript;
      if (responseText) {
        try {
          // Update local messages state first
          setMessages((prev) => [
            ...prev,
            {
              role: "assistant",
              content: responseText,
              isWebRTC: true,
              timestamp: Date.now(),
            },
          ]);

          // Then try to save to Firebase
          const response = await callAPI("saveWebRTCConversation", {
            sessionId: uniqueId,
            role: "assistant",
            content: responseText,
            timestamp: Date.now(),
          });

          if (!response.success) {
            console.error(
              "Failed to save message to Firebase:",
              response.error,
            );
            // Message is still in local state, but failed to save to Firebase
          }
        } catch (error) {
          console.error("Error saving assistant WebRTC message:", error);
          // Message is still in local state, but failed to save to Firebase
        }
      }
    } else if (
      serverEvent.type ===
      "conversation.item.input_audio_transcription.completed"
    ) {
      const transcript = serverEvent.transcript;
      if (transcript) {
        setMessages((prev) => [
          ...prev,
          {
            role: "user",
            content: transcript,
            isWebRTC: true,
            timestamp: Date.now(),
          },
        ]);
        const response = await callAPI("saveWebRTCConversation", {
          sessionId: uniqueId,
          role: "user",
          content: transcript,
          timestamp: Date.now(),
        });
        if (!response.success) {
          console.error("Failed to save user transcription:", response.error);
        }
      }
    }
  };

  // Also handle the output_item.done event
  const handleEvent = async (e) => {
    try {
      const serverEvent = JSON.parse(e.data);

      switch (serverEvent.type) {
        case "response.done":
          console.log("=== ASSISTANT MESSAGE ===");
          await handleWebRTCResponse(serverEvent);
          break;
        case "conversation.item.input_audio_transcription.completed":
          console.log("=== USER MESSAGE ===");
          await handleWebRTCResponse(serverEvent);
          break;
        default:
      }
    } catch (error) {
      console.error("Error in handleEvent:", error);
    }
  };

  // Add logging to the data channel setup
  useEffect(() => {
    if (dataChannel) {
      dataChannel.addEventListener("message", handleEvent);
      return () => {
        dataChannel.removeEventListener("message", handleEvent);
      };
    }
  }, [dataChannel]);

We’re never receiving any logs for conversation.item.input_audio_transcription.completed

in handleWebRTCResponse - its accurately getting the text from response.done, but not from the next case of conversation.item.input_audio_transcription.completed

tompos1 · December 22, 2024, 7:59pm

I have the same issue. I’m setting the session param like this: input_audio_transcription: {model: ‘whisper-1’},

but when I get the session object back it changes to “input_audio_transcription”: null,

This is my code:

const openAIResponse = await fetch(‘https://api.openai.com/v1/realtime/sessions’, {
method: ‘POST’,
headers: {
Authorization: Bearer ${process.env.OPENAI_API_KEY}, // ENV variable in Convex
‘Content-Type’: ‘application/json’,
},
body: JSON.stringify({
model: ‘gpt-4o-mini-realtime-preview-2024-12-17’,
voice: ‘alloy’,
input_audio_transcription: {model: ‘whisper-1’},
instructions: instructions,
}),
});

specialK45 · December 23, 2024, 4:43pm

yep, that is following the documentation, as mine is. We’re not getting the response as we should be.

OpenAI Forum leaders, can you please figure this out

choprarishabh1010199 · December 23, 2024, 4:57pm

Hi, since you’re using the WebRTC version, did you face the problem of the AI looping and talking to itself. I just sent over a “Hi”, and it seems to recursively greet me with “Hey”, “Hello there”,… It seems like it’s hearing it’s own audio and responding. Do you know how to fix this issue?

specialK45 · December 23, 2024, 5:00pm

That had happened to me once or twice but not consistently, I took it as it being too sensitive but maybe you’re right and its hearing itself

choprarishabh1010199 · December 23, 2024, 5:08pm

With me, it’s happening every time. I’m unable to have a simple conversation since it’s constantly talking to itself. When I put the speakers on mute, and read my logs, I can see the AI only responds to my questions, and not to itself. I made a post.

If you see something obviously wrong, please let me know!

specialK45 · December 23, 2024, 5:11pm

What device are you using? I looked at your post and don’t see anything weird. The only thing I can think is that your device settings / the device you’re using youre getting feedback

sashirestela · December 23, 2024, 5:16pm

Have you verified if the transcription failed? The failure arrives in another event:

conversation.item.input_audio_transcription.failed

specialK45 · December 23, 2024, 5:32pm

I would try it yourself, it simply doesn’t work. You never get anything past input_audio_buffer_commited. We never get any audio transcript failed log when I log all the events from the web rtc.

It’s very much a bug in their API and if you search the forum people have been complaining about this since earlier this year

efa · January 9, 2025, 11:43pm

No its not, someone mentioned it in another post that you actually have to send the input audio transcription in a “session.update” after creating the session. It just doesnt work if you do it in session creation.

makeiteasier · January 11, 2025, 6:27am

This is correct. It only works if you update the session. I do it as soon as I get the session created event

const handleSessionCreated = useCallback((event) => {
        console.log('Session created:', event.session);
        const updateEvent = {
            type: 'session.update',
            event_id: sessionId.current,
            session: {
                input_audio_transcription: { model: 'whisper-1' },
            },
        };
        emitEvent(updateEvent);
    }, []);

Topic		Replies	Views
[Realtime API] Input audio transcription is not showing Bugs realtime	11	2554	May 12, 2025
Issues with Transcription in Realtime Model Using WebRTC Bugs realtime	15	1061	April 30, 2025
Can't get the user transcription in realtime api API transcribe , realtime	7	1365	May 12, 2025
Missing input audio transcription API api-realtime	6	97	May 12, 2025
Getting no response event for input_audio_transcription in realtime ws API realtime , api-realtime	12	1777	March 28, 2025

Retrieving user response from Realtime Voice WebRTC

Related topics