Realtime/transcription_sessions API returns 401 even when adding ephemeral key

I get a ephemeral key on server side like below

"use server";

import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function fetchSTTEphemeralToken(): Promise<OpenAI.Beta.Realtime.Sessions.SessionCreateResponse.ClientSecret> {
  const response = await openai.beta.realtime.transcriptionSessions.create({
    input_audio_format: "pcm16",
    input_audio_noise_reduction: { type: "near_field" },
    input_audio_transcription: {
      model: "gpt-4o-transcribe",
      prompt: "",
      language: "ja",
    },
    turn_detection: { type: "semantic_vad", eagerness: "low" },
  });

  const ephemeralToken = response.client_secret;

  return ephemeralToken;
}

and when I try to connect via WebRTC like described in this document it throws 401. is any of my settings wrong?

  const startRecording = async () => {
    try {
      setStatus("Initializing...");
      setError(null);

      // Get ephemeral token from server
      const ephemeralToken = await fetchSTTEphemeralToken();
      console.log("Ephemeral token:", ephemeralToken);

      // Create peer connection
      const pc = new RTCPeerConnection();
      peerConnectionRef.current = pc;

      // Set up audio element for remote audio
      const audioEl = new Audio();
      audioEl.autoplay = true;
      audioElementRef.current = audioEl;

      pc.ontrack = (e) => {
        audioEl.srcObject = e.streams[0];
      };

      // Get microphone access and add track
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      streamRef.current = stream;
      pc.addTrack(stream.getTracks()[0]);

      // Set up data channel
      const dc = pc.createDataChannel("oai-events");
      dataChannelRef.current = dc;

      dc.onmessage = (event) => {
        const message = JSON.parse(event.data);
        console.log("Data channel message:", message);

        // Handle transcription messages
        if (message.type === "transcription") {
          if (message.transcription && message.transcription.text) {
            setTranscription(message.transcription.text);
          }
        }
      };

      // Create and set local description
      const offer = await pc.createOffer();
      await pc.setLocalDescription(offer);

      // Send offer to server and get answer
      const baseUrl =
        "https://api.openai.com/v1/realtime/transcription_sessions";
      const model = "gpt-4o-transcribe";
      const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
        method: "POST",
        body: offer.sdp,
        headers: {
          Authorization: `Bearer ${ephemeralToken.value}`,
          "Content-Type": "application/sdp",
        },
      });

      const sdpResponseText = await sdpResponse.text();

      console.log("SDP response:", sdpResponseText);

      const answer = {
        type: "answer" as RTCSdpType,
        sdp: sdpResponseText,
      };
      await pc.setRemoteDescription(answer);

      setIsRecording(true);
      setStatus("Recording");
    } catch (err) {
      console.error("Error starting recording:", err);
      setError(`Error: ${err instanceof Error ? err.message : String(err)}`);
      setStatus("Error");
    }
  };

error

SDP response: {
  "error": {
    "message": "Incorrect API key provided: ek_***********************. You can find your API key at https://platform.openai.com/account/api-keys.",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}
2 Likes

Confirming that there is an issue and the OP is correct. The code given in the documentation for WebRTC connection using ephemeral keys does not work (returns error 401 “Incorrect API key provided: ek_…”).

2 Likes

I’ve found that OpenAI stuff answers in another topic that transcription mode is only supported via Websockets right now

1 Like

@dkundel thank you for the update that we can now use WebRTC for transcription only. However, as mentioned earlier, I am still getting the error:

SDP response: {
  "error": {
    "message": "Incorrect API key provided: ek_67e7a***********************26ea. You can find your API key at https://platform.openai.com/account/api-keys.",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

When retrieving an EphemeralToken like so:

"use server";

import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function fetchSTTEphemeralToken(): Promise<OpenAI.Beta.Realtime.Sessions.SessionCreateResponse.ClientSecret> {
  const response = await openai.beta.realtime.transcriptionSessions.create({
    input_audio_format: "pcm16",
    input_audio_noise_reduction: { type: "near_field" },
    input_audio_transcription: {
      model: "gpt-4o-transcribe",
      prompt: "",
      language: "en",
    },
    turn_detection: { type: "semantic_vad", eagerness: "low" },
  });

  const ephemeralToken = response.client_secret;

  return ephemeralToken;
}

I then use it within my startRecording function:

const startRecording = async () => {
    try {
      setStatus("Initializing...");
      setError(null);

      // Get ephemeral token from server
      const ephemeralToken = await fetchSTTEphemeralToken();
      console.log("Ephemeral token:", ephemeralToken);

      // Create peer connection
      const pc = new RTCPeerConnection();
      peerConnectionRef.current = pc;

      // Set up audio element for remote audio
      const audioEl = new Audio();
      audioEl.autoplay = true;
      audioElementRef.current = audioEl;

      pc.ontrack = (e) => {
        audioEl.srcObject = e.streams[0];
      };

      // Get microphone access and add track
      const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
      streamRef.current = stream;
      pc.addTrack(stream.getTracks()[0]);

      // Set up data channel
      const dc = pc.createDataChannel("oai-events");
      dataChannelRef.current = dc;

      dc.onmessage = (event) => {
        const message = JSON.parse(event.data);
        console.log("Data channel message:", message);

        // Handle transcription messages
        if (message.type === "transcription") {
          if (message.transcription && message.transcription.text) {
            setTranscription(message.transcription.text);
          }
        }
      };

      // Create and set local description
      const offer = await pc.createOffer();
      await pc.setLocalDescription(offer);

      // Send offer to server and get answer
      const baseUrl =
        "https://api.openai.com/v1/realtime/transcription_sessions";
      const model = "gpt-4o-transcribe";
      const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
        method: "POST",
        body: offer.sdp,
        headers: {
          Authorization: `Bearer ${ephemeralToken.value}`,
          "Content-Type": "application/sdp",
        },
      });

      const sdpResponseText = await sdpResponse.text();

      console.log("SDP response:", sdpResponseText);

      const answer = {
        type: "answer" as RTCSdpType,
        sdp: sdpResponseText,
      };
      await pc.setRemoteDescription(answer);

      setIsRecording(true);
      setStatus("Recording");
    } catch (err) {
      console.error("Error starting recording:", err);
      setError(`Error: ${err instanceof Error ? err.message : String(err)}`);
      setStatus("Error");
    }
  };

Your code is almost correct. Just make this change:

  const baseUrl = "https://api.openai.com/v1/realtime"

See Realtime Transcription for a working example.

2 Likes

@juberti Hi.

I was able to establish the connection by following your code, but how can I enable semantic_vad?

I encountered an error when I tried to specify it while fetching the ephemeral token, as shown in the code below (the turn_detection part is commented out):

export async function fetchSTTEphemeralToken(): Promise<OpenAI.Beta.Realtime.Sessions.SessionCreateResponse.ClientSecret> {
  const response = await openai.beta.realtime.transcriptionSessions.create({
    input_audio_noise_reduction: { type: "near_field" },
    input_audio_transcription: {
      model: "gpt-4o-transcribe",
      language: "ja",
    },
    // WebRTC connection creation will fail when specifying this
    // turn_detection: { type: "semantic_vad", eagerness: "low" },
  });

  const ephemeralToken = response.client_secret;

  return ephemeralToken;
}

Following the documentation(https://platform.openai.com/docs/guides/realtime-vad), I also tried sending a session.update message via the data channel after establishing the connection, but that resulted in an error as well. Here’s the code for sending the update:

enableSemanticVAD(eagerness: "low" | "medium" | "high" | "auto" = "auto") {
    if (!this.dc || this.dc.readyState !== "open") {
      console.error("Data channel not open, can't send session update");
      return false;
    }

    const updateMessage = {
      type: "session.update",
      session: {
        turn_detection: {
          type: "semantic_vad",
          eagerness: eagerness,
        },
      },
    };

    try {
      this.dc.send(JSON.stringify(updateMessage));
      console.log("Sent session update:", updateMessage);
      return true;
    } catch (error) {
      console.error("Error sending session update:", error);
      return false;
    }
  }

Below is the error message I received:

Data channel message:
error:
  code: null
  event_id: null
  message: "The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the session ID sess_BHMMGpeQUkLJmaBDV7XGZ in your message.)"
  param: null
  type: "server_error"

Confirmed, thanks for the clear explanation. We’ll look into this and report back.

1 Like

When trying to set this via session.update you’ll need to do transcription_session.update instead. We also just rolled out a fix to allow semantic_vad to be specified in the token creation request.

1 Like

thanks, I was able to connect with transcription_session.update with semantic_vad specified.

but it does not seem to behave like semantically judging conversation end, it behaves same as server_vad.
I described the same issue in another topic as well