and when I try to connect via WebRTC like described in this document it throws 401. is any of my settings wrong?
const startRecording = async () => {
try {
setStatus("Initializing...");
setError(null);
// Get ephemeral token from server
const ephemeralToken = await fetchSTTEphemeralToken();
console.log("Ephemeral token:", ephemeralToken);
// Create peer connection
const pc = new RTCPeerConnection();
peerConnectionRef.current = pc;
// Set up audio element for remote audio
const audioEl = new Audio();
audioEl.autoplay = true;
audioElementRef.current = audioEl;
pc.ontrack = (e) => {
audioEl.srcObject = e.streams[0];
};
// Get microphone access and add track
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
streamRef.current = stream;
pc.addTrack(stream.getTracks()[0]);
// Set up data channel
const dc = pc.createDataChannel("oai-events");
dataChannelRef.current = dc;
dc.onmessage = (event) => {
const message = JSON.parse(event.data);
console.log("Data channel message:", message);
// Handle transcription messages
if (message.type === "transcription") {
if (message.transcription && message.transcription.text) {
setTranscription(message.transcription.text);
}
}
};
// Create and set local description
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// Send offer to server and get answer
const baseUrl =
"https://api.openai.com/v1/realtime/transcription_sessions";
const model = "gpt-4o-transcribe";
const sdpResponse = await fetch(`${baseUrl}?model=${model}`, {
method: "POST",
body: offer.sdp,
headers: {
Authorization: `Bearer ${ephemeralToken.value}`,
"Content-Type": "application/sdp",
},
});
const sdpResponseText = await sdpResponse.text();
console.log("SDP response:", sdpResponseText);
const answer = {
type: "answer" as RTCSdpType,
sdp: sdpResponseText,
};
await pc.setRemoteDescription(answer);
setIsRecording(true);
setStatus("Recording");
} catch (err) {
console.error("Error starting recording:", err);
setError(`Error: ${err instanceof Error ? err.message : String(err)}`);
setStatus("Error");
}
};
error
SDP response: {
"error": {
"message": "Incorrect API key provided: ek_***********************. You can find your API key at https://platform.openai.com/account/api-keys.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
Confirming that there is an issue and the OP is correct. The code given in the documentation for WebRTC connection using ephemeral keys does not work (returns error 401 âIncorrect API key provided: ek_âŚâ).
@dkundel thank you for the update that we can now use WebRTC for transcription only. However, as mentioned earlier, I am still getting the error:
SDP response: {
"error": {
"message": "Incorrect API key provided: ek_67e7a***********************26ea. You can find your API key at https://platform.openai.com/account/api-keys.",
"type": "invalid_request_error",
"param": null,
"code": "invalid_api_key"
}
}
I was able to establish the connection by following your code, but how can I enable semantic_vad?
I encountered an error when I tried to specify it while fetching the ephemeral token, as shown in the code below (the turn_detection part is commented out):
Following the documentation(https://platform.openai.com/docs/guides/realtime-vad), I also tried sending a session.update message via the data channel after establishing the connection, but that resulted in an error as well. Hereâs the code for sending the update:
Data channel message:
error:
code: null
event_id: null
message: "The server had an error while processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the session ID sess_BHMMGpeQUkLJmaBDV7XGZ in your message.)"
param: null
type: "server_error"
When trying to set this via session.update youâll need to do transcription_session.update instead. We also just rolled out a fix to allow semantic_vad to be specified in the token creation request.
thanks, I was able to connect with transcription_session.update with semantic_vad specified.
but it does not seem to behave like semantically judging conversation end, it behaves same as server_vad.
I described the same issue in another topic as well