TTS voice and user microphone on the same stream

contact72 · January 30, 2024, 9:29pm

Hi,

I want to build an job interviewer chatbot.

TTS ask the questions (defined by the employer) to user
User answer to questions with its microphone (can be long)

The whole conversation is saved on a server (as a sound file).

I wonder what technical stack use.

My main issue is to have the TTS voice and user voice in the main stream (to record the conversation)

I don’t need to parse or analyze user voice, just save it.

Have you got ideas ?

Thanks !

Franck.

supershaneski · January 31, 2024, 3:03am

What do you mean you do not need to analyze user voice? The interviewer bot will just ask question and after the user answers, then it will proceed to the next question?

My main issue is to have the TTS voice and user voice in the main stream (to record the conversation)

You have separate file for TTS and user voice and just stitch them together.

In front end, you can just use Web Audio API to record and play audio. In the backend you’ll need ffmpeg. You can use any framework for the front end.

contact72 · January 31, 2024, 5:30am

Thanks for your replay @supershaneski

Yes, exactly.

I’ve separate files for TTS.

The final goal is to obtain one single audio file with the conversation between interviewer (TTS) and user (micro).

I’m afraid that divided recording between frontend and backend and rebuild a single file will be very complicated.

So I look for a simple solution for recording the conversation on the same stream.

Perhaps recording all the browser tab audio with WebRTC ?

Thanks.

supershaneski · January 31, 2024, 6:06am

I’m afraid that divided recording between frontend and backend and rebuild a single file will be very complicated.

My idea is since you already have the TTS question files, you only need to play it in the frontend. After it is played, the interviewee’s answer will be recorded. Maybe there is time limit and/or a button to end recording answer. Afterwards, the recorded audio data is sent to the backend to save. Then play the next question. Now, after the interview session ends, you end up with: answer_001.mp3, answer_002.mp3, etc. You can then stitch them together with the corresponding question_001.mp3, question_002.mp3, etc. you already have in the server.

To concatenate the audio files using ffmpeg:

ffmpeg -f concat -safe 0 -i list.txt -c copy 202401311456_interview001.mp3

where list.txt is

file '/path/to/file/question_001.mp3'
file '/path/to/file/answer_001.mp3'
file '/path/to/file/question_002.mp3'
file '/path/to/file/answer_002.mp3'
file '/path/to/file/question_003.mp3'
file '/path/to/file/answer_003.mp3'

Topic		Replies	Views
Send data stream to TTS API API api	2	1865	April 17, 2024
How would you concatenate a whole file in a single .mp3? Community	1	535	September 9, 2021
Realtime: Recording to a stream from browser API realtime	3	577	January 23, 2025
ChatGPT API TTS streaming API api	3	4538	January 21, 2025
Implementing audio conversation with AI API	8	4040	April 29, 2024

TTS voice and user microphone on the same stream

Related topics