hi guys , I am using the wispher model to change the voice as to text and call GPT4 model to get the response ,after that, I convert the response to voice again through by tts model ,is there any better way to enhance the performance, I think the delay is not accetable, is there any good idea ?
yes you can use the open source whisper installed in your local computer. if you have a good enough spec in your local computer, perhaps it can reduce the time delay. i suggest you put a flag in your code to toggle which to use (API or local) so that you can compare.
compare the peromance of API vs local is good idea, and thank you for your feedback