Delay Issue with Call Function

I’m currently using the call function feature, and every time a customer uses it, the call function correctly captures the information provided by the customer, such as time, requirements, and name. However, there’s a delay of about 4 seconds from sending the call function API request to receiving this information. I’ve set the request_timeout to 5 seconds, but even if the customer doesn’t use the call function, it still takes 5 seconds before returning to the normal QA flow. This delay is too slow for a seamless user experience. On the other hand, if I reduce the request_timeout to 3 seconds or even shorter, it may timeout before capturing all the necessary information.

Is there any way to enable customers to skip the call function quickly and return to the normal QA flow when they are not using it? I would appreciate any advice or solutions to improve the response time without compromising the data capture process.

Thank you for your help

How many API calls are you doing?

For example, like in function call sample, there is at least two API calls. If the API request does not return function_call, do you call Chat API again or not?

Basically, if I set the request_timeout to be long enough, the response can be correct. Here is an example. I have a call function called “req_idle”, which requires the user to provide a time and a service, and it responds to whether that time slot is available or not. The problem arises when the user doesn’t want to inquire about the availability of a time slot and instead asks other questions. In such cases, the call function takes a long time to respond because it is waiting for the user to use the call function, which is not the case.

Yes, I think I might have experienced similar behavior before.

The answer here is: user feedback.

before “content” output is streamed, to then be discovered as a function call:
“thinking about that”
While running function API and waiting again for a chatbot
“researching…”
“typing your answer…”

We have already implemented the user feedback with system messages like ‘thinking about that’, ‘researching…’, and ‘typing your answer…’ to provide better communication while waiting for the call function’s response. Not every user wants to use the ‘send mail’ or ‘ask weather’ function, and it can be frustrating to wait for the call function response when the user simply wants to ask a general question.

How to explore a more dynamic approach where the call function is triggered only when necessary, based on the user’s input. For instance, if the user explicitly asks for a specific function like ‘send mail’, the call function will be activated. However, if the user asks a regular question, the call function can be skipped, and we can directly respond to their query without delay.

You might put in a time log, and discover where the actual delay is coming from.

For example, are you using an external module that is getting invoked each time, resetting all its variables and instantiating the same functions again?

One could consider making your different API functions as class objects, making a function-holding object that stays in memory, and then just changing the state of the input and calling its method to retrieve data.

Also, if the function is doing something like recording data to a database, which doesn’t need a return value or AI success status, you can send that task off to its own thread and return a function role message immediately, implying success.

If you don’t want the AI to be calling functions all the time for things it can answer itself, it helps to inform it of what it is able to answer. The AI can’t be aware of its own knowledge, because it hasn’t generated the tokens (with probabilities informed by training) to see if it knows.

I have identified the issue, and I apologize for the confusion. My intention is to obtain the following parameters using the provided code:

      response = openai.ChatCompletion.create(
      model="gpt-4-0613",
      messages=messages,
      functions=functions,
      function_call="auto",
      request_timeout = 5)

from the generated response, I extract the following parameters:

arguments_str = message["function_call"]["arguments"]
arguments = json.loads(arguments_str)
A= arguments["A"]
B= arguments["B"]
C= arguments["C"]

However, there are instances where I might not use the functions, and instead, the response will directly contain the user’s query, which will generate a response["choices"][0]["message"], and this will incur some processing time.

My goal is to have it generate only the ["function_call"]["arguments"] without any additional processing.
If functions are not used, there is no need to generate the [message], and it can be skipped to avoid unnecessary processing time.