Hwo to use assistant API for conversational speeds

Hi everyone,

I have followed through with this awesome blog post from Twilio:

this allows you to call a phone number and talk to Chat GPT in real time. The speeds are good enough for a natural conversation.
The magic part is this one:

const chatCompletion = await openai.chat.completions.create({
            messages: messages,
            model: 'gpt-4',
          <truncated>

the chat completion feature is quick enough to allow a smooth response time.

This has one big caveat though, in which it works with the non-customized Chat GPT model. What I then did is update this code to use the assistant API, reference my own assistant ID, create a thread, add a message, create a run and wait for the response, polling in 100 milisecond intervals to not loose too much time.

Problem is, the response times are in the 3000 to 4000 miliseconds range. while I get it to work and I am using my custom GPT (i.e. my assistant) via the phone, which is pretty amazing on its own, the response times are nowhere good enough for a smooth conversation.

how can I combine the speed of chat completion with my custom knowledge and instructions offered by the assistant API?

Thansk and best regards

Welcome @nils.lamb,

IMO, currently the model gpt-4o-mini is the fastest model available on the OpenAI API.

However, it’s important to note that calls to Assistants generally take longer than direct calls to the chat completion endpoint. This difference in response time is due to the multiple steps involved in processing Assistant calls.

Keep in mind that model speed is just one factor affecting the overall response time for Assistants.


Here’s a plot showing my observations from an experiment to measure the time to the first token for this model on the chat completions endpoint:

1 Like

true. I am using this very model in the hope that it was the fastest. but using assistants generally outweights the performance benefits of even the fastest model to the degree where a real time conversation no longer feels like a conversation.

I am hoping that someone has some ideas or maybe even an outlook on how to fix that.

or is it possible to pass instructions / knowledge base into the chat completion? probably not functions, but I could live with that for the time being.
I would imagine if I just pass the whole knowledgebase as a message history into the chat completion, that would blow up the token consumption exorbitantly and while it might, strictly speaking, work, it would no longer be cost efficient.

If the context is limited to the current instance of the voice call, then yes, gpt-4o-mini on the chat completion endpoint is all you need.

It’s the cheapest model on the API with 128k context length, costing $0.150 / 1M input tokens, and $0.600 / 1M output tokens.

1 Like