So, I’m getting started actually constructing some of my own programs instead of just tinkering around with prompting stuff. I’ve been poking around LangChain and of course python, even considering constructing my own .NET framework since I’m most comfortable in C# tbh.
Here’s my problem: I cannot stand LangChain with a passion. The only reason I’m considering it is because of its simple structure for tool use, but even then I’m not finding it satisfactory, nor is it built quite in the way I would design it. However, I think my other problem is in understanding how specifically MRKL framework is constructed to make it easier for LLMs to find and execute various chains/functions/what have you in LangChain.
Personally, I would rather build my own framework for tool-using LLMs, and I’m about ready to do so because I don’t really like anything that’s currently out there. The problem is understanding how an LLM could retrieve the right tool (or function) from an indefinitely sized list or dictionary of tools or functions. Without LangChain, how is this typically being achieved if it’s been achieved at all? Does it use embeddings to search for the proper tool or function in a given program? Is this how people are using RAG?
I’m not even talking about executing it even, it’s as simple as trying to understand how people have succeeded overall in getting an LLM to find the tool or function it thinks it’s supposed to use. Once I identify where things are at with this, I’ll be golden from here. I can figure out the rest code-wise once I know what works and what doesn’t with this.
If anyone could help me with some pointers or tips or tricks in this, that would be greatly appreciated! Thank you!
Not sure where you’re starting from but it sounds like you might already be familiar with function calling but if not this cookbook is a good hands-on start and the docs are also helpful.
You can actually feed multiple functions to a model at once (at the cost of tokens and context window consumption) and let the model choose which tool to use.
If you have more details on your use case or what you’re trying to accomplish there may be additional direction.
I think it’s more an issue of struggling to find the right words to explain my use case and what I’d like to do. Maybe I’m overcomplicating things too.
Basically I just want to turn natural language queries into script executions (or at least retrieval to a specific script or function that can then be confirmed and run after confirmation).
I’m thinking of an example that could go like this:
Q: “Can you pull up VScode for me?”
looks for custom-built python script or function that calls the python script that pulls up vscode
A: “You want me to do this right?” shows the function call or script
script gets executed
The premise / idea is that I put the heavy lifting logic unto the script or tool itself; the AI doesn’t need to know what’s in it or call something like an API directly; it just needs it to know what the script does and when to pull it. The rest will be handled within the app it’s programmed in and the script it pulled.
The best I can explain it is like, instead of turning a natural language prompt into a direct formatted function call per se, like turning “whats the weather like outside?” to a parameterized function call that interacts with an API, If I ask “what’s the weather like outside?” The model merely points to a python script that does call an API. Does that make sense?
Looking at your example there’s a couple of ways to approach this commonly today:
- You, as the user, provide the tool (function) you want the model to use.
- You, as the user, provide several possible tools (functions) the model may use and you expect the model to choose the tool.
In either case one way of approaching this would be to provide 1 or more functions to the model. The model may then reply with a function_call.
In the first case, using your example, you could provide the VScode tool (function) to the model and you might even force the model to use the tool by setting the
function_call property (default is “auto”).
In the second case you might provide several tools (functions) (i.e. get_weather, do_math, vscode,…) with your chat completion request and let the model decide which, if any, tool to use.
When it does decide to use a tool it will return a message with function_call and a JSON document that includes what function (e.g. vscode) and what parameters should be used with the function. It’s then up to you to actually call that function with the parameters in your code and (potentially) return the results to the model.
The cookbook linked above does a pretty good job of demonstrating how this works.
Function is a great tool, I love it a lot but it also can lead to some frustrations… Sometime chatgpt makes up values that were not provided by the user (I have been struggling to have chatgpt not to make an email parameter but ask to the user).
The target api /swagger must be designed specifically to interact with ChatGPT… Each operation / parameter well described… And error response messages must be crafted for the Ai. : to cut a long story short, it’s not the usual kind of rest api one would implement.
I am working on a poc here (and it is in c#)… You should appreciate how I hooked into shawshbuckle to extract from the swagger document on the fly the function json schema to provide to chatgpt