Can WebRTC Be Used for a Real-Time Text-to-Text Chatbot Instead of WebSockets?

Hi everyone,

I have a question regarding WebRTC and its potential use for a text-based chatbot. I understand that WebRTC is typically used for real-time voice and video communication, but I’m wondering if it’s possible to leverage WebRTC for a real-time text-to-text chatbot—without using voice—so that I can avoid the costs associated with real-time voice processing and only pay for real-time text.

Due to certain constraints, I’m unable to use WebSockets for this project, which is why I’m exploring alternative options.

Would using WebRTC for this purpose be viable? If so, are there any significant drawbacks compared to WebSockets when handling a real-time chatbot?

I’d really appreciate any insights or experiences you can share. Thanks in advance! :blush:

This is what I tried to implement but wasn’t able to use it:

const peerConnection = new RTCPeerConnection({});
const dataChannel = peerConnection.createDataChannel('oai-events', {
   ordered: true,
});

const offer = await peerConnection.createOffer({});
await peerConnection.setLocalDescription(offer);

const sdpResponse = await fetch(`${OPENAI_BASE_URL}?model=${OPEN_AI_REALTIME_MODEL}`, {
        method: 'POST',
        body: offer.sdp,
        headers: {
          Authorization: `Bearer ${EPHEMERAL_KEY}`,
          'Content-Type': 'application/sdp',
        },
});

 const answer = {
   type: 'answer',
   sdp: await sdpResponse.text(),
};

await peerConnection.setRemoteDescription(answer);

peerConnectionRef.current = peerConnection;

But what I get from the OpenAI Server is this error/response:


{
    "type": "answer",
    "sdp": "{\"error\":{\"message\":\"Invalid SDP offer. Offer did not have an audio media section.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_offer\"}}"
}

Only thing that I need is a text communication through the WebRTC protocol.

Sending a message like the below code:

    const event = {
      type: 'conversation.item.create',
      item: {
        type: 'message',
        role: 'user',
        content: [
          {
            type: 'input_text',
            text: testCounter.current.toString(),
          },
        ],
      },
    };

    dataChannel?.send(JSON.stringify(event));

and parsing the response like this:

      dataChannel.addEventListener('message', (event) => {
        try {
          const message = JSON.parse(event.data);

          if (message.type === 'response.done' && message?.response?.output[0]?.content[0]?.text) {
            logger(message?.response?.output[0]?.content[0]?.text);
          }
        } catch (error) {
          logger(error, 'Error in data channel message');
        }
      });

But I can not when I don’t put the audio in the connection flow.

Yes, I couldn’t get the webrtc to work either, I do use websockets to play around the real time API. I found the 4o mini realtime works better than the 4o realtime. I got more static using the full model. I do use the voice function

#Up

Anyone idea about how we can use WebRTC only for sending / receiving events through stringified JSONs?

Hi!

It looks like the Realtime WebRTC API will accept an empty audio stream. Using the code below, I could connect and send/receive text events. Audio token usage seems to remain 0 (to double-check!).

const peerConnection = new RTCPeerConnection();
const dataChannel = peerConnection.createDataChannel('oai-events', {
    ordered: true,
});

// add an empty audio track
const audioContext = new AudioContext();
const destination = audioContext.createMediaStreamDestination();
const audioStream = destination.stream;
const audioTrack = audioStream.getAudioTracks()[0];
peerConnection.addTrack(audioTrack);

const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);

const sdpResponse = await fetch(/* same arguments */);
const answer = {
    type: 'answer',
    sdp: await sdpResponse.text(),
};

await peerConnection.setRemoteDescription(answer);

looks good! you could probably also use peerConnection.addTransceiver(“audio”) if you don’t want to bother with an AudioContext.

Yes just adding an audio transceiver also works, brilliant!

Somewhat unrelated: let’s make the webhook connection compatible with the standard WebSocket constructor., to be able to connect using the built-in WebSocket of Node v22 (or a browser even). Example:

const url = "wss://api.openai.com/v1/realtime?call_id=" + callId;
const ws = new WebSocket(url, [
    `openai-ephemeral-key.${ephemeralKey}`
]);

can’t you just do

const ws = new WebSocket(url, { headers: {Authorization: 'Bearer ${ephemeralKey}’,}});

My understanding is that the standard WebSocket constructor takes some extra “protocols” as a list of strings. Some JS runtimes (ex: Deno or Bun) extend the second argument to support objects with headers, but it will not work in Node v22 (requires extra libs like ws) nor browsers.

It would be great to be able to connect from any runtime :slightly_smiling_face: