Can WebRTC Be Used for a Real-Time Text-to-Text Chatbot Instead of WebSockets?

Hi everyone,

I have a question regarding WebRTC and its potential use for a text-based chatbot. I understand that WebRTC is typically used for real-time voice and video communication, but I’m wondering if it’s possible to leverage WebRTC for a real-time text-to-text chatbot—without using voice—so that I can avoid the costs associated with real-time voice processing and only pay for real-time text.

Due to certain constraints, I’m unable to use WebSockets for this project, which is why I’m exploring alternative options.

Would using WebRTC for this purpose be viable? If so, are there any significant drawbacks compared to WebSockets when handling a real-time chatbot?

I’d really appreciate any insights or experiences you can share. Thanks in advance! :blush:

This is what I tried to implement but wasn’t able to use it:

const peerConnection = new RTCPeerConnection({});
const dataChannel = peerConnection.createDataChannel('oai-events', {
   ordered: true,
});

const offer = await peerConnection.createOffer({});
await peerConnection.setLocalDescription(offer);

const sdpResponse = await fetch(`${OPENAI_BASE_URL}?model=${OPEN_AI_REALTIME_MODEL}`, {
        method: 'POST',
        body: offer.sdp,
        headers: {
          Authorization: `Bearer ${EPHEMERAL_KEY}`,
          'Content-Type': 'application/sdp',
        },
});

 const answer = {
   type: 'answer',
   sdp: await sdpResponse.text(),
};

await peerConnection.setRemoteDescription(answer);

peerConnectionRef.current = peerConnection;

But what I get from the OpenAI Server is this error/response:


{
    "type": "answer",
    "sdp": "{\"error\":{\"message\":\"Invalid SDP offer. Offer did not have an audio media section.\",\"type\":\"invalid_request_error\",\"param\":null,\"code\":\"invalid_offer\"}}"
}

Only thing that I need is a text communication through the WebRTC protocol.

Sending a message like the below code:

    const event = {
      type: 'conversation.item.create',
      item: {
        type: 'message',
        role: 'user',
        content: [
          {
            type: 'input_text',
            text: testCounter.current.toString(),
          },
        ],
      },
    };

    dataChannel?.send(JSON.stringify(event));

and parsing the response like this:

      dataChannel.addEventListener('message', (event) => {
        try {
          const message = JSON.parse(event.data);

          if (message.type === 'response.done' && message?.response?.output[0]?.content[0]?.text) {
            logger(message?.response?.output[0]?.content[0]?.text);
          }
        } catch (error) {
          logger(error, 'Error in data channel message');
        }
      });

But I can not when I don’t put the audio in the connection flow.

Yes, I couldn’t get the webrtc to work either, I do use websockets to play around the real time API. I found the 4o mini realtime works better than the 4o realtime. I got more static using the full model. I do use the voice function

#Up

Anyone idea about how we can use WebRTC only for sending / receiving events through stringified JSONs?