So I am building a function calling mechanism for 2000+ functions (think SQL) through Chain of Thought. The use case is an org that has 3000+ tables and different users with different skills in different departments. I have gone down the rabbit hole of generating dynamic SQL etc…
But for the moment, I am considering only these "hard coded functions. The question is whether function selection through tool calling is practical; assuming a taxanomy tree.
What I have done so far is : created a POC for 100 odd functions at various levels of depth in the taxonomy tree (min_depth = 2, max_depth=5). At each level, I employ tool calling to target a specific branch in the tree. Then I travserse all the way till i hit the leaves.
So far it seems to be working ok. ofc I make several tool calls to navigate through the hiearchy and therefore the response time is dependent on depth. I intend to track the logprobs through the sequence (currently it is only the best response at any given level)
Any pointers by others who have implemented similar would be much appreciated.
2000 functions is incredibly large and that’s a massive amount of context!
Remember for every Completion call you will have to inform the LLM afresh that you have all these functions and all their configuration.
That will be extremely expensive to run in Production even if you manage to fit all that context in (I’m too lazy to do the estimation on a Sunday evening).
Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model’s context limit and are billed as input tokens. If you run into token limits, we suggest limiting the number of functions or the length of the descriptions you provide for function parameters.
It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined in your tools specification."
This does not appear to describe any pre-process which involves limiting the function calling context. I’m not sure where you got that from?
Therefore, I repeat my assertion that 2000 functions will be extremely expensive to run in Production and probably will fail due to pushing well past the models attention! You are also significantly eating into your context budget for any retrieved information.
I will define 200 fictitious functions (well beyond the documented limit of 128 functions) distributed amongst 10 departments with varying level of depth with differing function descriptions.
Then let’s have three user descriptions; each one to target a specific function; so see if I can hit that specific function. Notice if I can hit a specific function, it is one more tool call to call that specific function.
"Keep the number of functions low for higher accuracy
We recommend that you use no more than 20 functions in a single tool call. Developers typically see a reduction in the model’s ability to select the correct tool once they have between 10-20 tools.
If your use case requires the model to be able to pick between a large number of functions, you may want to explore fine-tuning (learn more) or break out the tools and group them logically to create a multi-agent system"
The approach somewhat makes sense. Any ambiguity is only in the actual implementation.
First, however: logprobs are turned off as soon as a tool is invoked.
You might see application, such as utilizing the top-3 enums the AI was going to send. OpenAI sees more application in denying you transparency.
Additionally, even the probability that a function will be called is stripped out of the “fake logprobs” you get. You cannot see that there was a 43% of token 1002xx that will signal the output handler to capture a tool.
You do not get token numbers at all, at best, you get string bytes.
So your cleverness is denied.
Then we get to the implementation.
An AI could emit a tool call for "which type of database category will answer this. Then, you do not need to return a tool call response; you could just place the more-specialized tool function and call again.
It is only in the minutae that we need to think deeper.
can the AI follow what it has been doing to get to that point, to back out and try another path;
if no fulfilling function is there, have you mandated a function anyway so that the AI doesn’t respond to the user, but pursues the chain?
So it is just having a clear design pattern that cannot fail, and cannot go nuts with recursion, while still having an overall picture why the function would be initially called despite the intense specialization to be reached unable to be shown.
or: embeddings…
3000 functions? 3000 descriptions for them. 1 AI taught how to write like the descriptions. Top-20 results presented.
You don’t need tell others meticulously why what you want to do is possible – you just have to make it possible.
Indeed. My original question was whether someone has accomplished something similar and if so, did they have some pointers. Then I got dragged into this challenge thing. which is not a bad thing
I’m sure you might be able to run a preprocessing task locally to cut down the list of functions sent to the API but I don’t see Open AI is doing this for us.
If any Open AI preprocessing is happening let’s see the docs please.
Back to the original question:
Why the heck do you need 2000 functions?
Can’t you make some of them more generic, e.g. write a SQL statement?
Since the goal is have a managed and controlled set of functions (2000+ functions can get quite hairy to manage), it is better to give control to individual departments to manage their own sub-trees… at least that is the theory. We will see.
registry = ManagedFunctionRegistry()
@registry.managed_function()
def operations_functions() :
"""
This is where all operations related functions are listed. This is typically related to
support of customers, monitoring of our production systems, invoice approval of our
vendors.
"""
return None
@registry.managed_function("operations_functions", "customer_support", "customers")
def ops_cs_customers_list_by(criterion:Annotated[str, "The criterion to use for listing customers"],
order: Annotated[str, " How to list the customers ...i.e. ascendencing or descending? "] = "ASC",
limit: Annotated[str, "how many customers to list?"]=None ) \
-> Annotated[dict,
"""
:return: Returns a dictionary with keys
- list_customers (List[str]): returns the json list of customers encoded as a string.
"""]:
"""
This function returns a list of customers for a specific criterion in an ordered list of strings.
"""
In the example above, the function’s decorator incorporates them into the tree
operations_functions() is incorporated at root
ops_cs_customers_list_by is incorporated four levels down: ROOT-> “operations_functions”-> “customer_support”->“customers”
Originally I was looking ro use LLMs to make the function calls at every level; methodically going down a level, each time a choice is made at that level.
That’s a great idea. Both the latency and expense of making tool calls is avoided. I will experiment with this option; BUT still while navigating the tree. In that way the algo maintain control over which branch to choose.
I believe that with the embeddings it is entirely possible to send ONLY the single function to the API.
Yes, that’s a similar use case for this strategy - but does it dynamically alter the tool population too? I didn’t think so?
But it’s moot for me: I’m exotic. I’m living in the Ruby on Rails eco-system. Usually I code everything from first principles (and that’s good for both my understanding of the technology and my level of control)