I have around 15 functions written in my schema and when user passes a question every time whole 15 functions are sent to gpt-3.5-turbo-1106 model.
Since i am using around 3500 tokens for every user query i tried to implement hierarchical function calling,
Initially the root function calling is triggered with 5 functions and then the branches have 3 functions each. So, now user ask a query first the root is triggered then that specific branch is triggered and the parameters are extracted.
But now i am facing the issue of function calling taking a lot of time as there could be multiple queries a user can ask and OpenAI completion api responds back after 10 seconds. Do any one knows if there is work around for this latency and non-responding issue?
Note: I am not using the assistants api since it is still in beta phase.