Send function call output from server in WebRTC connection

Let’s say we have the following scenario, for a quiz application:

  • using the realtime API, a browser client is connected via WebRTC
  • a function tool named “get_next_question” is used by the agent to retrieve the next quiz question to ask, together with its correct answer and explanation
  • when the agent wants to retrieve the next question, it requests a function tool call
  • the client reacts by sending an API request to a server (e.g. the same from which it retrieves the ephemeral API key)

Now, the server can’t simply answer to the client with the question and solution, as that would leak the data. Instead, what I would like to be able to do is the server would receive the session_id and call_id in the payload from the client, and it would send the output directly to the OpenAI API.

Is this possible?

This example from the docs shows a client sending a conversation item with the function call output using the data channel, but I haven’t been able to figure out whether there’s a way to simply send the item using the conversation API.

After spending some more time experimenting and reading the docs, it seems to me like there is no way to send conversation items to a real-time session without being in the session itself, either via webRTC or the original WS connection that originated the session.

Do you have any ideas as to how I could get around this and implement the desired feature, other than routing the whole session through the server using WS instead of webRTC to begin with, which wouldn’t be feasible given my current architecture?

Unfortunately, I don’t think there’s a way. I hope they do a version of the API where you can have a control plane websocket connection from your backend, and the client connects via webrtc for only the audio.

1 Like