I’m trying to create a system where I have a bunch of function calls. There is a coordinator super-function which calls a bunch of other function calls depending on my use case. Each of the sub-function also calls queries to the LLM.
But I think this approach will quickly blast through my TPM limit. I’m trying to find out if there’s a way to make these sub-calls spawn in different threads in a different query, and the thread will just append an output string to my main query call.
What is your thinking around the super function? It might be easier to spend more time on specific instructions (like you would have in that super function) in the assistant prompt to specify what (sub) functions to call when?
Here’s an example case. Let’s say I am creating a lesson generator.
My lesson generator will have a super function (lesson_generator) that will call other functions that will generate specific sections of a lesson. I will define these functions to be either a function that will query using an index to generate a section (generate_lesson_section), a function that will validate references via a scraper and the LLM (validate_reference), or other misc. functions that is used for calculating math stuffs. I can define other functions so on and so forth. I just want to know if there is a working example in which a “super function” coordinates all of these said function calls to create a single detailed and robust output.
This is not trying to directly answer your question but let you know that your idea is not far fetched and you just need to find the correct keywords and look for those research papers.
This is very close to what I was thinking of. The only difference is that he uses an SQL db while I use an index of embeddings. Thanks for sharing! Will check this out in detail.
Time and our combined efforts will tell what ways these things are going to be done! I am wondering why in that case you would not define each of those functions as a function for your assistant and the assistant being the super function? It seems much easier to ‘program’ the super function by tweaking the prompt?
I have something like that for an ‘email helper’. The Assistant has a prompt that is several pages long that covers the different types of tasks it knows how to handle. And most of the tasks are ‘sub tasks’ that themselves rely on an assistant to do someting. I am finding that programming a super function is harder that prompting. (And I do have a fully asynchronous handling system so I don’t have to worry about how long things take and run time-outs etc
Yeah. What you mentioned worked. I defined my super function to spawn new contexts and it fairly worked well. Although, I don’t think the super-context has any idea what my sub-contexts are outputting though. It just knows that it spawned new contexts. I just gave the spawning function the necessary arguments so that GPT-4 can supply them itself.
I have a router function, when this function is called I get the function name and a “context” parameter that encapsulates all the current chat history for everything that’s relevant to the tool request, this context is used in a detached thread as a replacement for chat history, and finally the final result of whatever happens in this new thread is returned as the function response of the router call. For normal chat history all messages with a thread_uuid as not included, so only the router call and the router call’s response are included.
When the router has a function response I use the whole “thread” along with the last message of the user and the router call (and response) to generate a final “follow up” response to give the user the output of the router call with all those details in context
yeah, this one way thing is tricky, but also keeps things clean which is nice
You described pretty much exactly the same system I have set up. The models kind of want things one way, and the better you can figure that out and deliver, the better they work. It means we’re all working on the same problems which is so cool.
For me this has been necessary because I have multiple chains of function calls and retry mechanics at different stages and with system messages for different errors and all that json data ends up being very intrusive on tokens in over-all chat history.
In order for the AI to perform good on complex tasks with function calling I’ve found it best to break it down into stages, and so this is the system I’ve ended up with.
I think it makes sense, even if costs go down and context goes up the general idea and pattern I think is still the most sensible and scalable.
Even fundamentally, I see open-source “mixture of experts” models pop up that are essentially based on a “router” model for routing messages to a model that’s fine tuned towards a specific purpose~ though not exactly related*