I’m developing a voice assistant app, and I’m using a real-time model to have real-time audio conversations with the user. I want ChatGPT to return two parts: one for the user to hear, and the other to express ChatGPT’s emotion. For example:
User: Tell me a joke.
Assistant: <TEXT>I have a funny joke</TEXT><MOOD>Happy</MOOD>
I can control ChatGPT to output the text as above using a Prompt, but the received audio includes the MOOD part. How can I make ChatGPT only generate the audio of the TEXT part?