Voice AI - understanding context before we generate answer to the user

Hey guys. I’m building the voice AI bot and want to understand the context of the conversation, such as the user’s name and the events that have occurred, before passing the prompt to GPT for response generation. I want to include this context as tokens for GPT completion.

I know this can be done with JSON_MODE or function calling, but it increases response latency. Do you have other suggestions for parsing conversation context before generating a response with GPT completion?

1 Like

Hi and welcome to the community!

I understand your goal is to infer additional information about the user’s request before passing it on for further processing. You are looking to enhance the request with additional information before replying.

In case you want to use another LLM then, as you already wrote, latency from the user’s perspective will be increased. You can run some tests if the additional time needed to request a response from GPT 3.5 Turbo is prohibitively long for your use case.

The other option I see is based on classic programming, I guess. The more you know about your data, the more assumptions you can make, the more performance you can get out of your processes. If you already have the additional information, from prior conversations or because you collected it in advance, then you can keep it readily available to add it to your prompt to the model.
But of course there will also be a latency increase. If properly implemented this should still be faster then getting a reply from 3.5 Turbo. It should also be cheaper.