Realtime Voice Double Voice

chinmay1 · November 27, 2024, 6:39am

We are using Realtime voice API. If we use the code provided by OpenAI then it works fine. But if use a different front end where a Start button initiates the API then we start to get two voice streams with similar (not same) content. Has anyone else experienced it?

j.wischnat · November 27, 2024, 6:45am

Check the websockets and the requests sent in the network tab in the chrome devtools. You are probably starting 2 sessions.

chinmay1 · November 27, 2024, 7:53am

Thanks. You are right. Fixed it.

j.wischnat · November 27, 2024, 8:23am

Awesome! Feel free to mark my answer as the solution to help future users.

chinmay1 · November 29, 2024, 4:06pm

While your Dx is correct. The fix is not working. The websocket is getting triggered thrice. We have tried various approaches including but the code is not working;

Call connectConversation inside a useEffect hook that runs once when the component mounts.
- Define connectConversation inside the useEffect hook: This ensures it has access to the latest props without causing unnecessary re-renders or re-executions.
- Use an empty dependency array [] in useEffect: This ensures the effect runs only once when the component mounts.

const LOCAL_RELAY_SERVER_URL: string = “”
// process.env.REACT_APP_LOCAL_RELAY_SERVER_URL || ‘’;
import mic from ‘…/…/…/…/public/assets/calltype/mic.svg’;
import linedotwave from ‘…/…/…/…/public/assets/calltype/linedotwave.svg’;

import { useEffect, useRef, useCallback, useState } from ‘react’;
import {
Container,
Stack,
Group,
Box,
Button,
SegmentedControl,
Card,
Text, Paper, Center, Badge
} from ‘@mantine/core’;
import Image from ‘next/image’;
import { X, Zap } from ‘react-feather’;
import { IconMicrophone, IconVolume2, IconMicrophoneOff, IconMicrophoneOff, } from ‘@tabler/icons-react’;
import earIcon from ‘…/…/…/…/public/assets/calltype/earIcon.svg’;
import zigzagwave from ‘…/…/…/…/public/assets/calltype/zigzagwave.svg’;

import { RealtimeClient } from ‘@openai/realtime-api-beta’;
import { ItemType } from ‘@openai/realtime-api-beta/dist/lib/client.js’;
import { WavRecorder, WavStreamPlayer } from ‘…/lib/wavtools/index.js’;
import { instructions } from ‘…/utils/conversation_config.js’;
import { WavRenderer } from ‘…/utils/wav_renderer’;
import ListeningPanel from ‘./LandingPage.js’;

interface RealtimeEvent {
time: string;
source: ‘client’ | ‘server’;
count?: number;
event: { [key: string]: any };
}

const resetAPIKey = useCallback(() => {
const apiKey = prompt(‘OpenAI API Key’);
if (apiKey !== null) {
localStorage.clear();
localStorage.setItem(‘tmp::voice_api_key’, apiKey);
window.location.reload();
}
}, );

// const connectConversation = useCallback(async () => {
// const client = clientRef.current;
// const wavRecorder = wavRecorderRef.current;
// const wavStreamPlayer = wavStreamPlayerRef.current;

// const initialPrompt = ‘Hello!’

// startTimeRef.current = new Date().toISOString();
// setIsConnected(true);
// setRealtimeEvents();
// setItems(client.conversation.getItems());

// await wavRecorder.begin();
// await wavStreamPlayer.connect();
// await client.connect();
// if (props.type !== “inbound”) {
// client.sendUserMessageContent([
// {
// type: input_text,
// text: initialPrompt,
// },
// ]);
// }
// else {

// }

// if (client.getTurnDetectionType() === ‘server_vad’ && client.isConnected()) {
// console.log(client);
// await wavRecorder.record((data) => client.appendInputAudio(data?.mono));
// }
// }, );
// const connectConversation = async () => {
// const client = clientRef.current;
// const wavRecorder = wavRecorderRef.current;
// const wavStreamPlayer = wavStreamPlayerRef.current;

// const initialPrompt = ‘Hello!’

// startTimeRef.current = new Date().toISOString();
// setIsConnected(true);
// setRealtimeEvents();
// setItems(client.conversation.getItems());

// await wavRecorder.begin();
// await wavStreamPlayer.connect();
// await client.connect();
// if (props.type !== “inbound”) {
// client.sendUserMessageContent([
// {
// type: input_text,
// text: initialPrompt,
// },
// ]);
// }
// else {

// }

// if (client.getTurnDetectionType() === ‘server_vad’ && client.isConnected()) {
// console.log(client);
// await wavRecorder.record((data) => client.appendInputAudio(data?.mono));
// }
// };
const connectConversation = useCallback(async () => {
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
const wavStreamPlayer = wavStreamPlayerRef.current;

   // Set state variables
   startTimeRef.current = new Date().toISOString();
   setIsConnected(true);
   setRealtimeEvents([]);
   setItems(client.conversation.getItems());

   // Connect to microphone
   await wavRecorder.begin();

   // Connect to audio output
   await wavStreamPlayer.connect();

   // Connect to realtime API
   await client.connect();
   if (props?.type === "outbound") {
       client.sendUserMessageContent([
           {
               type: input_text,
               text: props.firstOutBoundText,
               //  text: For testing purposes, I want you to list ten car brands. Number each item, e.g. "one (or whatever number you are one): the item name".
           },
       ]);
   }
   else {

   }

   if (client.getTurnDetectionType() === 'server_vad') {
       await wavRecorder.record((data) => client.appendInputAudio(data.mono));
   }

}, );

const disconnectConversation = useCallback(async () => {
props.setStartStatus(“generatingresult”)
const itemsdata = clientRef.current.conversation.getItems();
console.log(itemsdata);
itemsdata?.forEach((ex: any) => {
delete ex?.formatted
})
props.stopchat(itemsdata)
setIsConnected(false);
setRealtimeEvents();
setItems();
setMemoryKv({});

   const client = clientRef.current;
   client.disconnect();

   const wavRecorder = wavRecorderRef.current;
   await wavRecorder.end();

   const wavStreamPlayer = wavStreamPlayerRef.current;
   await wavStreamPlayer.interrupt();

}, );

const deleteConversationItem = useCallback(async (id: string) => {
const client = clientRef.current;
client.deleteItem(id);
}, );

const stopRecording = async () => {
console.log(‘stopRecording called’);
setIsRecording(false);
const client = clientRef.current;
const wavRecorder = wavRecorderRef.current;
await wavRecorder.pause();
client.createResponse();
};

};
useEffect(() => {
if (!fircallref.current) {
// setTurnEndType(“server_vad”)
// changeTurnEndType(“server_vad”)
connectConversation()
fircallref.current = true

   }
   else {

   }

}, [turnEndType])
// useEffect(()=>{
// if(fircallref.current===true){
// setTimeout(()=>{
// connectConversation()
// },5000)
// }
// },[fircallref.current])

// useEffect(() => {
// if (eventsScrollRef.current) {
// const eventsEl = eventsScrollRef.current;
// const scrollHeight = eventsEl.scrollHeight;
// if (scrollHeight !== eventsScrollHeightRef.current) {
// eventsEl.scrollTop = scrollHeight;
// eventsScrollHeightRef.current = scrollHeight;
// }
// }
// }, [realtimeEvents]);

useEffect(() => {
let isLoaded = true;

   const wavRecorder = wavRecorderRef.current;
   const clientCanvas = clientCanvasRef.current;
   // console.log(clientCanvasRef);
   let clientCtx: CanvasRenderingContext2D | null = null;

   const wavStreamPlayer = wavStreamPlayerRef.current;
   const serverCanvas = serverCanvasRef.current;
   let serverCtx: CanvasRenderingContext2D | null = null;

   const render = () => {
       if (isLoaded) {
           if (clientCanvas) {
               if (!clientCanvas.width || !clientCanvas.height) {
                   clientCanvas.width = clientCanvas.offsetWidth;
                   clientCanvas.height = clientCanvas.offsetHeight;
               }
               clientCtx = clientCtx || clientCanvas.getContext('2d');
               if (clientCtx) {
                   clientCtx.clearRect(0, 0, clientCanvas.width, clientCanvas.height);
                   const result = wavRecorder.recording
                       ? wavRecorder.getFrequencies('voice')
                       : { values: new Float32Array([0]) };
                   WavRenderer.drawBars(
                       clientCanvas,
                       clientCtx,
                       result.values,
                       '#0099ff',
                       10,
                       0,
                       8
                   );
               }
           }
           if (serverCanvas) {
               if (!serverCanvas.width || !serverCanvas.height) {
                   serverCanvas.width = serverCanvas.offsetWidth;
                   serverCanvas.height = serverCanvas.offsetHeight;
               }
               serverCtx = serverCtx || serverCanvas.getContext('2d');
               if (serverCtx) {
                   serverCtx.clearRect(0, 0, serverCanvas.width, serverCanvas.height);
                   const result = wavStreamPlayer.analyser
                       ? wavStreamPlayer.getFrequencies('voice')
                       : { values: new Float32Array([0]) };
                   WavRenderer.drawBars(
                       serverCanvas,
                       serverCtx,
                       result.values,
                       '#009900',
                       10,
                       0,
                       8
                   );
               }
           }
           window.requestAnimationFrame(render);
       }
   };
   render();

   return () => {
       isLoaded = false;
   };

}, );

// Add a useEffect to monitor if the assistant is speaking
useEffect(() => {
const interval = setInterval(() => {
const wavStreamPlayer = wavStreamPlayerRef.current;
if (wavStreamPlayer && wavStreamPlayer.analyser) {
const result: any = wavStreamPlayer.getFrequencies(‘voice’);
const maxAmplitude = Math.max(…result.values);
const threshold = 0.01; // Adjust threshold as needed
if (maxAmplitude > threshold) {
setIsAssistantSpeaking(true);
} else {
setIsAssistantSpeaking(false);
}
} else {
setIsAssistantSpeaking(false);
}
}, 100);

   return () => {
       clearInterval(interval);
   };

}, );
const endapicall = () => {
console.log(“apicall end chat”)
}
useEffect(() => {
if (!initref.current) {
const wavStreamPlayer = wavStreamPlayerRef.current;
const client: any = clientRef.current;
const customtool = [
{
“type”: “function”,
“name”: “endapicall”,
description: ‘Call this whenever conversation has been ended or the user or bot has said something like “bye”.’,
}
]
client.updateSession({ instructions: props.promptdata });
client.updateSession({
voice: props.voiceName, tool: customtool, tools: customtool, input_audio_transcription: { model: ‘whisper-1’ }, “tool_choice”: “auto”,
“temperature”: 0.8,
});
props.voiceName

       client.on('realtime.event', (realtimeEvent: RealtimeEvent) => {
           setRealtimeEvents((realtimeEvents) => {
               const lastEvent = realtimeEvents[realtimeEvents.length - 1];
               if (lastEvent?.event.type === realtimeEvent.event.type) {
                   lastEvent.count = (lastEvent.count || 0) + 1;
                   return realtimeEvents.slice(0, -1).concat(lastEvent);
               } else {
                   return realtimeEvents.concat(realtimeEvent);
               }
           });
       });
       client.on('disconnect', async () => {
           console.log('Client disconnected');
       });
       client.on('error', (event: any) => console.log(event));
       client.on('conversation.interrupted', async () => {
           const trackSampleOffset = await wavStreamPlayer.interrupt();
           if (trackSampleOffset?.trackId) {
               const { trackId, offset } = trackSampleOffset;
               await client.cancelResponse(trackId, offset);
           }
       });
       client.on('conversation.updated', async ({ item, delta }: any) => {
           console.log('conversation.updated', item);
           const items = client.conversation.getItems();
           if (delta?.audio) {
               wavStreamPlayer.add16BitPCM(delta.audio, item.id);
           }
           if (item.status === 'completed' && item.formatted.audio?.length) {
               const wavFile = await WavRecorder.decode(
                   item.formatted.audio,
                   24000,
                   24000
               );
               item.formatted.file = wavFile;
           }
           setItems(items);
       });

       setItems(client.conversation.getItems());
       initref.current === true
       return () => {
           client.reset();
       };
   }

}, );

return (
<>
<Paper
p={10}
shadow=“md”
radius=“md”
style={{
height: “100%”,
width: “100%”,
// padding: ‘2rem’,
background: ‘linear-gradient(18deg, #0175F7 6.39%, #16BDFA 64.4%)’,
color: ‘white’,
}}
>

           <Stack>
               <Text></Text>
               {isConnected && !canPushToTalk && (
                   <>
                       <Box h={40} mb={87}>
                           {isAssistantSpeaking && isConnected ?
                               <Center mb={15}>
                                   <Group gap={10}>
                                       <Image src={earIcon} style={{ height: "40px" }} />
                                       <Text c="#000" fz={14} fw={700}>The customer is listening...</Text>
                                   </Group>
                               </Center>
                               :
                               <Box h={40}>
                                   {isConnected  ?
                                       <Center mb={15}>
                                           <Group gap={10}>
                                               <Image src={earIcon} />
                                               <Text c="#000" fz={14} fw={700}>The customer is speaking...</Text>
                                           </Group>
                                       </Center>
                                       : ""}
                               </Box>
                           }

                       </Box>
                       <Center h={14} mb={30}>
                           {isAssistantSpeaking ? 
                           <Box style={{opacity:${isRecording ? "":"0.2"}}}>
                               <IconMicrophone  color="#073041" size={62} />
                           </Box> : 
                           <Box>
                               <IconMicrophone color="#073041" size={62} />
                           </Box>
                           }

                           {/* <Badge
                               // variant="outline"
                               bg="#040E13"
                               c={!isRecording ? '#9C9C9C' : '#9C9C9C'}
                               size="lg"
                               fz={12}
                               leftSection=
                               {!isRecording ? <IconMicrophoneOff color="#FF6A56" size={16} /> : <IconMicrophone color="#98D283" size={16} />}
                           >
                               <Text fz={12} fw={400}>{!isRecording ? 'Mic Muted' : 'Mic On'}</Text>
                           </Badge> */}
                       </Center>
                       <Box h={52} pl={22} pr={22} mb={109}>
                           {isAssistantSpeaking ?
                               <Group mb={11} justify='center'>
                                   <Image src={zigzagwave} />
                               </Group>
                               : 

                               <Group mb={11} justify='center'>
                               <Image src={linedotwave} />
                           </Group>
                           }
                       </Box>


                     
                   </>
               )}
               {isConnected && canPushToTalk && (
                   <>
                       <Box h={40}>
                           {isAssistantSpeaking && isConnected ?
                               <Center mb={15}>
                                   <Group gap={10}>
                                       <Image src={earIcon} style={{ height: "40px" }} />
                                       <Text c="#000" fz={14} fw={700}>The customer is listening...</Text>
                                   </Group>
                               </Center>
                               :
                               <Box h={40}>
                                   {isConnected && isRecording ?
                                       <Center mb={15}>
                                           <Group gap={10}>
                                               <Image src={earIcon} />
                                               <Text c="#000" fz={14} fw={700}>The customer is speaking...</Text>
                                           </Group>
                                       </Center>
                                       : ""}
                               </Box>
                           }

                       </Box>
                       <Center h={14} mb={15}>
                           <Badge
                               // variant="outline"
                               bg="#040E13"
                               c={!isRecording ? '#9C9C9C' : '#9C9C9C'}
                               size="lg"
                               fz={12}
                               leftSection=
                               {!isRecording ? <IconMicrophoneOff color="#FF6A56" size={16} /> : <IconMicrophone color="#98D283" size={16} />}
                           >
                               <Text fz={12} fw={400}>{!isRecording ? 'Mic Muted' : 'Mic On'}</Text>
                           </Badge>
                       </Center>
                       <Box h={52} pl={22} pr={22}>
                           {isAssistantSpeaking ?
                               <Group mb={11} justify='center'>
                                   <Image src={linedotwave} />
                               </Group>
                               : ""
                           }
                       </Box>


                       <Center mb={109}>
                           <Button
                               p={0}
                               size="xl"
                               radius="100%"
                               style={{
                                   height: 144,
                                   width: 144,
                                   backgroundColor: "#073041",
                                   opacity: ${isRecording ? "0.1" : ""}
                                   // backgroundColor: micMuted ? '#004c6d' : '#0078a6',
                               }}
                               color={isRecording ? '#073041' : undefined}
                               disabled={!isConnected || !canPushToTalk}
                               onPointerDown={startRecording}
                               onPointerUp={stopRecording}
                           >
                               <Box>
                                   <Image src={mic} />
                                   <Text>
                                       {isRecording ? 'release to send' : 'push to talk'}
                                       {/* Hold-to-Talk */}
                                   </Text>

                               </Box>
                           </Button>

                       </Center>
                   </>
               )}
               <Group mt={isConnected ? 0:"50%"}>
                   <Box h={60} bg="#040E13" p={10}
                       style={{ borderRadius: "30px" }}
                   >
                       <Group p={0} m={0} w="100%" justify='end' className='SegmentedControl_group'>
                           <SegmentedControl
                               h="100%"
                               value={turnEndType}
                               onChange={(value) => {
                                   setTurnEndType(value);
                                   changeTurnEndType(value);
                               }}
                               data={[
                                   { label: 'Noisy Mode', value: 'none' },
                                   { label: 'Quiet Mode', value: 'server_vad' },
                               ]}
                               color="#19BEF8"
                               radius={0}
                               style={{ borderTopLeftRadius: "30px", borderTopRightRadius: "30px", borderBottomLeftRadius: "30px", borderBottomRightRadius: "30px" }}
                               size="md"
                               p={0}
                               // styles={(theme) => ({
                               //     root: {
                               //       border: '1px solid #ccc',
                               //     },
                               //     control: {
                               //       '&:hover': {
                               //         backgroundColor: 'red',
                               //       },
                               //     },
                               //     label: {
                               //       color: '#333',
                               //     },
                               //     active: {
                               //       backgroundColor: 'red',
                               //     },
                               //     labelActive: {
                               //       color: '#000000',
                               //     },
                               //     disabled: {
                               //       opacity: 0.6,
                               //       cursor: 'not-allowed',
                               //     },
                               //   })}
                           />
                           <Button
                               onClick={isConnected ? disconnectConversation : connectConversation}
                               fw={400} fz={12} radius={40} variant="filled" c={isConnected ? "#FFFFFF" : "#FFFFFF"} bg={isConnected ? "#E61B00" : "blue"} h={40}
                           >
                               {isConnected ? 'End' : 'Start'}
                           </Button>

                       </Group>

                   </Box>
               </Group>
           </Stack>
       </Paper>





   </>

);
}

export default ConsolePage;

j.wischnat · December 2, 2024, 6:47am

Hey!
I can’t really spoonfeed you a solution but make sure to try the following:

Add logs everywhere you think the AI is being called.
Check if there are logs twice or multiple times where there should only be one.
This will confirm where exactly you are calling code multiple times and will hopefully resolve the issue.

chinmay1 · December 17, 2024, 3:19pm

So we used a global variable, which once turned on do not allow double speak. That works great on the desktop. But on the mobile the variable is not setting to right value.

I am sure I am not the only one having this issue as we are using OpenAI suggested code. Would Love for someone moderator to help us with this.

j.wischnat · December 17, 2024, 3:47pm

As this isn’t an issue on OpenAI’s end, I doubt someone higher up will reply or try and help, sadly.

Please do go ahead and try to implement my suggestion of logging function calls and API calls, I am very sure that you either have duplicate code or duplicate function calling logic in your code.

Maybe you are simply also just executing the final program twice?

Logging will 100% help you with your issue - or at least help with finding the issue. OpenAI staff wouldn’t be able to help you much more than this either!

EDIT: Also, please, re-format your code so it’s easier to read - for now I can see two useEffect triggers that might be the cause of it?

Remove the separate effect for connectConversation
Fix the initref assignment
Call connectConversation only after client setup
Use proper cleanup

Topic		Replies	Views
Chatgpt api (openai-node v4.26.0) stream issue with gpt-4 models Bugs	18	1462	February 15, 2024
4o and 4 API output has typo/missing words Bugs gpt-4	55	874	July 19, 2024
Real-Time Model is hearing and talking to itself in a loop API	12	1691	September 18, 2025
Request failed with status code 400 API	43	60587	January 29, 2024
Console Log Security or Receiving Other People's Responses? API api	6	1009	October 1, 2023

Realtime Voice Double Voice

Related topics