Hypothetically it’s possible for an LLM to sort through a list of tools and with trial and error find the best tool for any task, whether it may be an observation or an action. Especially with greater context ranges, and better trained models, it will be easier and easier.
We are already seeing such things working with agents and multi modal LLMs.
We are also getting better at prompting these models.
Imagine this: in a conversation, get the new user input
Let’s say the user wants to send an email to their boss saying they will be late because of traffic.
Ask GPT4 whether the user is asking a request or providing an observation or chatting casually.
Based on the response, then refine: if there is a request, what kind of request is it? Information, action, or something else?
If an action, what kind of action is it? Personal tools, creative, business, coding, communication?
If communication, is it via email, twitter, text message, or something else?
If email, which email do you want to use? From the selection, the LLM will be provided with details how to make an actual plugin call.
Ok, email sent.
The point is, with this hierarchical kind of system, you can allow a limited context model to be able to use any kind of tool, for any purpose. Of course, with a larger context, you need to have fewer steps in the way 
And the possibilities are endless.