Hi,
I want to build an job interviewer chatbot.
- TTS ask the questions (defined by the employer) to user
- User answer to questions with its microphone (can be long)
The whole conversation is saved on a server (as a sound file).
I wonder what technical stack use.
My main issue is to have the TTS voice and user voice in the main stream (to record the conversation)
I don’t need to parse or analyze user voice, just save it.
Have you got ideas ?
Thanks !
Franck.
What do you mean you do not need to analyze user voice? The interviewer bot will just ask question and after the user answers, then it will proceed to the next question?
My main issue is to have the TTS voice and user voice in the main stream (to record the conversation)
You have separate file for TTS and user voice and just stitch them together.
In front end, you can just use Web Audio API to record and play audio. In the backend you’ll need ffmpeg. You can use any framework for the front end.
Thanks for your replay @supershaneski
Yes, exactly.
I’ve separate files for TTS.
The final goal is to obtain one single audio file with the conversation between interviewer (TTS) and user (micro).
I’m afraid that divided recording between frontend and backend and rebuild a single file will be very complicated.
So I look for a simple solution for recording the conversation on the same stream.
Perhaps recording all the browser tab audio with WebRTC ?
Thanks.
I’m afraid that divided recording between frontend and backend and rebuild a single file will be very complicated.
My idea is since you already have the TTS question files, you only need to play it in the frontend. After it is played, the interviewee’s answer will be recorded. Maybe there is time limit and/or a button to end recording answer. Afterwards, the recorded audio data is sent to the backend to save. Then play the next question. Now, after the interview session ends, you end up with: answer_001.mp3, answer_002.mp3, etc. You can then stitch them together with the corresponding question_001.mp3, question_002.mp3, etc. you already have in the server.
To concatenate the audio files using ffmpeg:
ffmpeg -f concat -safe 0 -i list.txt -c copy 202401311456_interview001.mp3
where list.txt is
file '/path/to/file/question_001.mp3'
file '/path/to/file/answer_001.mp3'
file '/path/to/file/question_002.mp3'
file '/path/to/file/answer_002.mp3'
file '/path/to/file/question_003.mp3'
file '/path/to/file/answer_003.mp3'