Realtime Voice Double Voice

We are using Realtime voice API. If we use the code provided by OpenAI then it works fine. But if use a different front end where a Start button initiates the API then we start to get two voice streams with similar (not same) content. Has anyone else experienced it?

Check the websockets and the requests sent in the network tab in the chrome devtools. You are probably starting 2 sessions. :hugs:

2 Likes

Thanks. You are right. Fixed it.

Awesome! Feel free to mark my answer as the solution to help future users. :partying_face:

While your Dx is correct. The fix is not working. The websocket is getting triggered thrice. We have tried various approaches including but the code is not working;

  1. Call connectConversation inside a useEffect hook that runs once when the component mounts.
    • Define connectConversation inside the useEffect hook: This ensures it has access to the latest props without causing unnecessary re-renders or re-executions.
    • Use an empty dependency array [] in useEffect: This ensures the effect runs only once when the component mounts.

const LOCAL_RELAY_SERVER_URL: string = “”
// process.env.REACT_APP_LOCAL_RELAY_SERVER_URL || ‘’;
import mic from ‘…/…/…/…/public/assets/calltype/mic.svg’;
import linedotwave from ‘…/…/…/…/public/assets/calltype/linedotwave.svg’;

import { useEffect, useRef, useCallback, useState } from ‘react’;
import {
Container,
Stack,
Group,
Box,
Button,
SegmentedControl,
Card,
Text, Paper, Center, Badge
} from ‘@mantine/core’;
import Image from ‘next/image’;
import { X, Zap } from ‘react-feather’;
import { IconMicrophone, IconVolume2, IconMicrophoneOff, IconMicrophoneOff, } from ‘@tabler/icons-react’;
import earIcon from ‘…/…/…/…/public/assets/calltype/earIcon.svg’;
import zigzagwave from ‘…/…/…/…/public/assets/calltype/zigzagwave.svg’;

import { RealtimeClient } from ‘@openai/realtime-api-beta’;
import { ItemType } from ‘@openai/realtime-api-beta/dist/lib/client.js’;
import { WavRecorder, WavStreamPlayer } from ‘…/lib/wavtools/index.js’;
import { instructions } from ‘…/utils/conversation_config.js’;
import { WavRenderer } from ‘…/utils/wav_renderer’;
import ListeningPanel from ‘./LandingPage.js’;

interface RealtimeEvent {
time: string;
source: ‘client’ | ‘server’;
count?: number;
event: { [key: string]: any };
}

function ConsolePage(props: any) {
console.log(props.voiceName)
const fircallref: any = useRef(null);
const initref: any = useRef(null);
const apiKey: any = “API_KEY_HERE”
if (apiKey !== ‘’) {
localStorage.setItem(‘tmp::voice_api_key’, apiKey);
}
const wavRecorderRef = useRef(
new WavRecorder({ sampleRate: 24000 })
);
const wavStreamPlayerRef = useRef(
new WavStreamPlayer({ sampleRate: 24000 })
);
const clientRef = useRef(
new RealtimeClient(
LOCAL_RELAY_SERVER_URL
? { url: LOCAL_RELAY_SERVER_URL }
: {
apiKey: apiKey,
dangerouslyAllowAPIKeyInBrowser: true,
}
)
);

const clientCanvasRef = useRef(null);
const serverCanvasRef = useRef(null);
const eventsScrollHeightRef = useRef(0);
const eventsScrollRef = useRef(null);
const startTimeRef = useRef(new Date().toISOString());
const [items, setItems] = useState<ItemType>();
const [realtimeEvents, setRealtimeEvents] = useState<RealtimeEvent>();
const [expandedEvents, setExpandedEvents] = useState<{
[key: string]: boolean;
}>({});
const [isConnected, setIsConnected] = useState(false);
const [canPushToTalk, setCanPushToTalk] = useState(true);
const [isRecording, setIsRecording] = useState(false);
const [isAssistantSpeaking, setIsAssistantSpeaking] = useState(false);
const [memoryKv, setMemoryKv] = useState<{ [key: string]: any }>({});
const [turnEndType, setTurnEndType] = useState(‘none’);

const formatTime = useCallback((timestamp: string) => {
const startTime = startTimeRef.current;
const t0 = new Date(startTime).valueOf();
const t1 = new Date(timestamp).valueOf();
const delta = t1 - t0;
const hs = Math.floor(delta / 10) % 100;
const s = Math.floor(delta / 1000) % 60;
const m = Math.floor(delta / 60_000) % 60;
const pad = (n: number) => {
let s = n + ‘’;
while (s.length < 2) {
s = ‘0’ + s;
}
return s;
};
return {pad(m)}:{pad(s)}.${pad(hs)};
}, );

const resetAPIKey = useCallback(() => {
const apiKey = prompt(‘OpenAI API Key’);
if (apiKey !== null) {
localStorage.clear();
localStorage.setItem(‘tmp::voice_api_key’, apiKey);
window.location.reload();
}
}, );

// const connectConversation = useCallback(async () => {
// const client = clientRef.current;
// const wavRecorder = wavRecorderRef.current;
// const wavStreamPlayer = wavStreamPlayerRef.current;

// const initialPrompt = ‘Hello!’

// startTimeRef.current = new Date().toISOString();
// setIsConnected(true);
// setRealtimeEvents();
// setItems(client.conversation.getItems());

// await wavRecorder.begin();
// await wavStreamPlayer.connect();
// await client.connect();
// if (props.type !== “inbound”) {
// client.sendUserMessageContent([
// {
// type: input_text,
// text: initialPrompt,
// },
// ]);
// }
// else {

// }

// if (client.getTurnDetectionType() === ‘server_vad’ && client.isConnected()) {
// console.log(client);
// await wavRecorder.record((data) => client.appendInputAudio(data?.mono));
// }
// }, );
// const connectConversation = async () => {
// const client = clientRef.current;
// const wavRecorder = wavRecorderRef.current;
// const wavStreamPlayer = wavStreamPlayerRef.current;

// const initialPrompt = ‘Hello!’

// startTimeRef.current = new Date().toISOString();
// setIsConnected(true);
// setRealtimeEvents();
// setItems(client.conversation.getItems());

// await wavRecorder.begin();
// await wavStreamPlayer.connect();
// await client.connect();
// if (props.type !== “inbound”) {
// client.sendUserMessageContent([
// {
// type: input_text,
// text: initialPrompt,
// },
// ]);
// }
// else {

// }

// if (client.getTurnDetectionType() === ‘server_vad’ && client.isConnected()) {
// console.log(client);
// await wavRecorder.record((data) => client.appendInputAudio(data?.mono));
// }
// };
const connectConversation = useCallback(async () => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;

   // Set state variables
   startTimeRef.current = new Date().toISOString();
   setIsConnected(true);
   setRealtimeEvents([]);
   setItems(client.conversation.getItems());

   // Connect to microphone
   await wavRecorder.begin();

   // Connect to audio output
   await wavStreamPlayer.connect();

   // Connect to realtime API
   await client.connect();
   if (props?.type === "outbound") {
       client.sendUserMessageContent([
           {
               type: input_text,
               text: props.firstOutBoundText,
               //  text: For testing purposes, I want you to list ten car brands. Number each item, e.g. "one (or whatever number you are one): the item name".
           },
       ]);
   }
   else {

   }

   if (client.getTurnDetectionType() === 'server_vad') {
       await wavRecorder.record((data) => client.appendInputAudio(data.mono));
   }

}, );

const disconnectConversation = useCallback(async () => {
props.setStartStatus(“generatingresult”)
const itemsdata = clientRef.current.conversation.getItems();
console.log(itemsdata);
itemsdata?.forEach((ex: any) => {
delete ex?.formatted
})
props.stopchat(itemsdata)
setIsConnected(false);
setRealtimeEvents();
setItems();
setMemoryKv({});

   const client = clientRef.current;
   client.disconnect();

   const wavRecorder = wavRecorderRef.current;
   await wavRecorder.end();

   const wavStreamPlayer = wavStreamPlayerRef.current;
   await wavStreamPlayer.interrupt();

}, );

const deleteConversationItem = useCallback(async (id: string) => {
const client = clientRef.current;
client.deleteItem(id);
}, );

const startRecording = async () => {
console.log(‘startRecording called’);
setIsRecording(true);
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;
const trackSampleOffset = await wavStreamPlayer.interrupt();
if (trackSampleOffset?.trackId) {
const { trackId, offset } = trackSampleOffset;
await client.cancelResponse(trackId, offset);
}
await wavRecorder.record((data) => client.appendInputAudio(data.mono));
};

const stopRecording = async () => {
console.log(‘stopRecording called’);
setIsRecording(false);
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
await wavRecorder.pause();
client.createResponse();
};

const changeTurnEndType = async (value: string) => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
if (value === ‘none’ && wavRecorder.getStatus() === ‘recording’) {
await wavRecorder.pause();
}
client.updateSession({
turn_detection: value === ‘none’ ? null : { type: ‘server_vad’ },
});
if (value === ‘server_vad’ && client.isConnected()) {
await wavRecorder.record((data) => client.appendInputAudio(data.mono));
}
setCanPushToTalk(value === ‘none’);
// connectConversation()

};
useEffect(() => {
if (!fircallref.current) {
// setTurnEndType(“server_vad”)
// changeTurnEndType(“server_vad”)
connectConversation()
fircallref.current = true

   }
   else {

   }

}, [turnEndType])
// useEffect(()=>{
// if(fircallref.current===true){
// setTimeout(()=>{
// connectConversation()
// },5000)
// }
// },[fircallref.current])

// useEffect(() => {
// if (eventsScrollRef.current) {
// const eventsEl = eventsScrollRef.current;
// const scrollHeight = eventsEl.scrollHeight;
// if (scrollHeight !== eventsScrollHeightRef.current) {
// eventsEl.scrollTop = scrollHeight;
// eventsScrollHeightRef.current = scrollHeight;
// }
// }
// }, [realtimeEvents]);

useEffect(() => {
let isLoaded = true;

   const wavRecorder = wavRecorderRef.current;
   const clientCanvas = clientCanvasRef.current;
   // console.log(clientCanvasRef);
   let clientCtx: CanvasRenderingContext2D | null = null;

   const wavStreamPlayer = wavStreamPlayerRef.current;
   const serverCanvas = serverCanvasRef.current;
   let serverCtx: CanvasRenderingContext2D | null = null;

   const render = () => {
       if (isLoaded) {
           if (clientCanvas) {
               if (!clientCanvas.width || !clientCanvas.height) {
                   clientCanvas.width = clientCanvas.offsetWidth;
                   clientCanvas.height = clientCanvas.offsetHeight;
               }
               clientCtx = clientCtx || clientCanvas.getContext('2d');
               if (clientCtx) {
                   clientCtx.clearRect(0, 0, clientCanvas.width, clientCanvas.height);
                   const result = wavRecorder.recording
                       ? wavRecorder.getFrequencies('voice')
                       : { values: new Float32Array([0]) };
                   WavRenderer.drawBars(
                       clientCanvas,
                       clientCtx,
                       result.values,
                       '#0099ff',
                       10,
                       0,
                       8
                   );
               }
           }
           if (serverCanvas) {
               if (!serverCanvas.width || !serverCanvas.height) {
                   serverCanvas.width = serverCanvas.offsetWidth;
                   serverCanvas.height = serverCanvas.offsetHeight;
               }
               serverCtx = serverCtx || serverCanvas.getContext('2d');
               if (serverCtx) {
                   serverCtx.clearRect(0, 0, serverCanvas.width, serverCanvas.height);
                   const result = wavStreamPlayer.analyser
                       ? wavStreamPlayer.getFrequencies('voice')
                       : { values: new Float32Array([0]) };
                   WavRenderer.drawBars(
                       serverCanvas,
                       serverCtx,
                       result.values,
                       '#009900',
                       10,
                       0,
                       8
                   );
               }
           }
           window.requestAnimationFrame(render);
       }
   };
   render();

   return () => {
       isLoaded = false;
   };

}, );

// Add a useEffect to monitor if the assistant is speaking
useEffect(() => {
const interval = setInterval(() => {
const wavStreamPlayer = wavStreamPlayerRef.current;
if (wavStreamPlayer && wavStreamPlayer.analyser) {
const result: any = wavStreamPlayer.getFrequencies(‘voice’);
const maxAmplitude = Math.max(…result.values);
const threshold = 0.01; // Adjust threshold as needed
if (maxAmplitude > threshold) {
setIsAssistantSpeaking(true);
} else {
setIsAssistantSpeaking(false);
}
} else {
setIsAssistantSpeaking(false);
}
}, 100);

   return () => {
       clearInterval(interval);
   };

}, );
const endapicall = () => {
console.log(“apicall end chat”)
}
useEffect(() => {
if (!initref.current) {
const wavStreamPlayer = wavStreamPlayerRef.current;
const client: any = clientRef.current;
const customtool = [
{
“type”: “function”,
“name”: “endapicall”,
description: ‘Call this whenever conversation has been ended or the user or bot has said something like “bye”.’,
}
]
client.updateSession({ instructions: props.promptdata });
client.updateSession({
voice: props.voiceName, tool: customtool, tools: customtool, input_audio_transcription: { model: ‘whisper-1’ }, “tool_choice”: “auto”,
“temperature”: 0.8,
});
props.voiceName

       client.on('realtime.event', (realtimeEvent: RealtimeEvent) => {
           setRealtimeEvents((realtimeEvents) => {
               const lastEvent = realtimeEvents[realtimeEvents.length - 1];
               if (lastEvent?.event.type === realtimeEvent.event.type) {
                   lastEvent.count = (lastEvent.count || 0) + 1;
                   return realtimeEvents.slice(0, -1).concat(lastEvent);
               } else {
                   return realtimeEvents.concat(realtimeEvent);
               }
           });
       });
       client.on('disconnect', async () => {
           console.log('Client disconnected');
       });
       client.on('error', (event: any) => console.log(event));
       client.on('conversation.interrupted', async () => {
           const trackSampleOffset = await wavStreamPlayer.interrupt();
           if (trackSampleOffset?.trackId) {
               const { trackId, offset } = trackSampleOffset;
               await client.cancelResponse(trackId, offset);
           }
       });
       client.on('conversation.updated', async ({ item, delta }: any) => {
           console.log('conversation.updated', item);
           const items = client.conversation.getItems();
           if (delta?.audio) {
               wavStreamPlayer.add16BitPCM(delta.audio, item.id);
           }
           if (item.status === 'completed' && item.formatted.audio?.length) {
               const wavFile = await WavRecorder.decode(
                   item.formatted.audio,
                   24000,
                   24000
               );
               item.formatted.file = wavFile;
           }
           setItems(items);
       });

       setItems(client.conversation.getItems());
       initref.current === true
       return () => {
           client.reset();
       };
   }

}, );

return (
<>
<Paper
p={10}
shadow=“md”
radius=“md”
style={{
height: “100%”,
width: “100%”,
// padding: ‘2rem’,
background: ‘linear-gradient(18deg, #0175F7 6.39%, #16BDFA 64.4%)’,
color: ‘white’,
}}
>

           <Stack>
               <Text></Text>
               {isConnected && !canPushToTalk && (
                   <>
                       <Box h={40} mb={87}>
                           {isAssistantSpeaking && isConnected ?
                               <Center mb={15}>
                                   <Group gap={10}>
                                       <Image src={earIcon} style={{ height: "40px" }} />
                                       <Text c="#000" fz={14} fw={700}>The customer is listening...</Text>
                                   </Group>
                               </Center>
                               :
                               <Box h={40}>
                                   {isConnected  ?
                                       <Center mb={15}>
                                           <Group gap={10}>
                                               <Image src={earIcon} />
                                               <Text c="#000" fz={14} fw={700}>The customer is speaking...</Text>
                                           </Group>
                                       </Center>
                                       : ""}
                               </Box>
                           }

                       </Box>
                       <Center h={14} mb={30}>
                           {isAssistantSpeaking ? 
                           <Box style={{opacity:${isRecording ? "":"0.2"}}}>
                               <IconMicrophone  color="#073041" size={62} />
                           </Box> : 
                           <Box>
                               <IconMicrophone color="#073041" size={62} />
                           </Box>
                           }

                           {/* <Badge
                               // variant="outline"
                               bg="#040E13"
                               c={!isRecording ? '#9C9C9C' : '#9C9C9C'}
                               size="lg"
                               fz={12}
                               leftSection=
                               {!isRecording ? <IconMicrophoneOff color="#FF6A56" size={16} /> : <IconMicrophone color="#98D283" size={16} />}
                           >
                               <Text fz={12} fw={400}>{!isRecording ? 'Mic Muted' : 'Mic On'}</Text>
                           </Badge> */}
                       </Center>
                       <Box h={52} pl={22} pr={22} mb={109}>
                           {isAssistantSpeaking ?
                               <Group mb={11} justify='center'>
                                   <Image src={zigzagwave} />
                               </Group>
                               : 

                               <Group mb={11} justify='center'>
                               <Image src={linedotwave} />
                           </Group>
                           }
                       </Box>


                     
                   </>
               )}
               {isConnected && canPushToTalk && (
                   <>
                       <Box h={40}>
                           {isAssistantSpeaking && isConnected ?
                               <Center mb={15}>
                                   <Group gap={10}>
                                       <Image src={earIcon} style={{ height: "40px" }} />
                                       <Text c="#000" fz={14} fw={700}>The customer is listening...</Text>
                                   </Group>
                               </Center>
                               :
                               <Box h={40}>
                                   {isConnected && isRecording ?
                                       <Center mb={15}>
                                           <Group gap={10}>
                                               <Image src={earIcon} />
                                               <Text c="#000" fz={14} fw={700}>The customer is speaking...</Text>
                                           </Group>
                                       </Center>
                                       : ""}
                               </Box>
                           }

                       </Box>
                       <Center h={14} mb={15}>
                           <Badge
                               // variant="outline"
                               bg="#040E13"
                               c={!isRecording ? '#9C9C9C' : '#9C9C9C'}
                               size="lg"
                               fz={12}
                               leftSection=
                               {!isRecording ? <IconMicrophoneOff color="#FF6A56" size={16} /> : <IconMicrophone color="#98D283" size={16} />}
                           >
                               <Text fz={12} fw={400}>{!isRecording ? 'Mic Muted' : 'Mic On'}</Text>
                           </Badge>
                       </Center>
                       <Box h={52} pl={22} pr={22}>
                           {isAssistantSpeaking ?
                               <Group mb={11} justify='center'>
                                   <Image src={linedotwave} />
                               </Group>
                               : ""
                           }
                       </Box>


                       <Center mb={109}>
                           <Button
                               p={0}
                               size="xl"
                               radius="100%"
                               style={{
                                   height: 144,
                                   width: 144,
                                   backgroundColor: "#073041",
                                   opacity: ${isRecording ? "0.1" : ""}
                                   // backgroundColor: micMuted ? '#004c6d' : '#0078a6',
                               }}
                               color={isRecording ? '#073041' : undefined}
                               disabled={!isConnected || !canPushToTalk}
                               onPointerDown={startRecording}
                               onPointerUp={stopRecording}
                           >
                               <Box>
                                   <Image src={mic} />
                                   <Text>
                                       {isRecording ? 'release to send' : 'push to talk'}
                                       {/* Hold-to-Talk */}
                                   </Text>

                               </Box>
                           </Button>

                       </Center>
                   </>
               )}
               <Group mt={isConnected ? 0:"50%"}>
                   <Box h={60} bg="#040E13" p={10}
                       style={{ borderRadius: "30px" }}
                   >
                       <Group p={0} m={0} w="100%" justify='end' className='SegmentedControl_group'>
                           <SegmentedControl
                               h="100%"
                               value={turnEndType}
                               onChange={(value) => {
                                   setTurnEndType(value);
                                   changeTurnEndType(value);
                               }}
                               data={[
                                   { label: 'Noisy Mode', value: 'none' },
                                   { label: 'Quiet Mode', value: 'server_vad' },
                               ]}
                               color="#19BEF8"
                               radius={0}
                               style={{ borderTopLeftRadius: "30px", borderTopRightRadius: "30px", borderBottomLeftRadius: "30px", borderBottomRightRadius: "30px" }}
                               size="md"
                               p={0}
                               // styles={(theme) => ({
                               //     root: {
                               //       border: '1px solid #ccc',
                               //     },
                               //     control: {
                               //       '&:hover': {
                               //         backgroundColor: 'red',
                               //       },
                               //     },
                               //     label: {
                               //       color: '#333',
                               //     },
                               //     active: {
                               //       backgroundColor: 'red',
                               //     },
                               //     labelActive: {
                               //       color: '#000000',
                               //     },
                               //     disabled: {
                               //       opacity: 0.6,
                               //       cursor: 'not-allowed',
                               //     },
                               //   })}
                           />
                           <Button
                               onClick={isConnected ? disconnectConversation : connectConversation}
                               fw={400} fz={12} radius={40} variant="filled" c={isConnected ? "#FFFFFF" : "#FFFFFF"} bg={isConnected ? "#E61B00" : "blue"} h={40}
                           >
                               {isConnected ? 'End' : 'Start'}
                           </Button>

                       </Group>

                   </Box>
               </Group>
           </Stack>
       </Paper>





   </>

);
}

export default ConsolePage;

Hey!
I can’t really spoonfeed you a solution but make sure to try the following:

Add logs everywhere you think the AI is being called.
Check if there are logs twice or multiple times where there should only be one.
This will confirm where exactly you are calling code multiple times and will hopefully resolve the issue. :hugs: