I don’t know how or why but I have been wasting two weeks of my time building something extremely useful but complex and without any coding knowledge ever since custom gpts were available.
I want to know if it is really ‘Complex’ even now with AI to 'dynamically’ be able to interpret a natural language query from a user and interact with 3rd party APIs such as Google and any other large companies that have tons of sets of different parameters?
Complex is admittedly relative. For some developers, it would not necessarily be considered complex, just buggy lol. Everyone is at different levels in their development journey, and we respect all levels
The complexity with 3rd party APIs specifically can be dependent on the API being called. The more parameters and calls necessary to perform a basic action, the more opportunity the AI has to fumble in one of their steps along the way. KISS is usually the way to go when it comes to developing your own API endpoint for the LLM.
Why would I want to develop my own API endpoint for the LLM in order to make 3rd party API calls? Please explain. I would really appreciate your input to clear the confusion i have below.
Scenario:
User Input: The flow starts with me sending the AI a natural language instruction to ask how many clicks and views did I get last Saturday, and regardless of the time format I use such as numerical format or word format, what could influence the AI to stumble and not be make the action and theoretically how to tackle these scenarios when I could ask the same question a 100 different ways?
Parse Input: Which AI type should be used here to parse the user input? is it a built-in AI agent to parse the input and extract campaign parameters (e.g., clicks, views, and date)? To my knowledge there are two types of AI endpoints. Conversational and custom assistant.
/v1/assistants is for assistants and /v1/chat/completions for conversation chat. Please advise what endpoint should be used for my use case and why and when to choose the assistant endpoint vs the chat completion? either way, both can take instructions to do a certain action.
**Executing The Action **:
do I need a custom function/tool within my app that will take the parameters from the AI and then make the call to Google Analytics API to bring the desired outcome or is it the AI that will call Google Analytics?
Any inputs and any further tips that i should know, please tell me.
A-ha, now I see where the discrepancy is I think.
When I mentioned that, I was not referring to API calls to a model, but API calls that you build for the model to execute itself.
This exemplifies what the communication breakdown is. That is not the beginning of your flow, that is the entire flow of the whole program.
The AI does not parse anything. It may be able to do some basic stuff with prompting, but this should be what your program handles. Whatever this is, it is unique to your scenario.
You need a custom function and tool to handle all of what you’re asking for here.
The LLM by default has no tools or extra functions it can just call on its own. You have to provide them yourself (or call the ones OAI has already prebuilt, like code interpreter).
If you want the AI to call any third party API (like google analytics), you need to build your own custom function/tool for that.
Both assistants and chat completion can work for this, although I’d probably start with Assistants to get a good feel for how the API works.
In essence you would be looking at a flow like this:
“How many clicks and views did I get last Saturday?” → Assistant API → Google Analytics Function → Results of Function fed back into Assistant API (this should still be part of the same call iirc) → “Here is what you did last Saturday: __”
I see. But I thought AI could parse because it can understand natural language queries by extracting the parameters from the queries themselves.
for example, how many clicks and views did I get last Saturday, the AI can understand this query and return a JSON response back to my app with the parameters that are relevant to the query. in this case, it would be something like ’ Clicks, Views, Date’, those parameters are then passed to a custom function within my app with these parameters to call Google Analytics API and the response, again passed back to the AI to extract the response and display the information naturally to me. Am I missing something here?
I have built a custom web app and can chat successfully with openai API, now just need to figure out the RIGHT/WISE app flow and functions.
No, that’s the right approach and exactly how it works in my OS repo:
You send Open AI the spec of your functions and the LLM interprets the users query and sends back a response with a request to perform a function call with specific parameters based on its interpretation.
You then carry out that function locally (on your server) which may include a call to a remote API and appropriately respond to the LLM with a packaged answer from the API.
In my case I do exactly this to retrieve stock prices from a dedicated stock price API, for example.
So if a user asks for the current price of Apple stock, the LLM, aware of a stock price function which takes a ticker, sends back the relevant JSON to the app and that app then sends those attributes to the stock price function which itself calls the remote API. Once the API responds, the function returns the answer in a natural language format to a prompt sent to the LLM and the LLM formulates and answer for the user.
Note also the concept of “internal thoughts” which is a conversation between the server and the LLM that the user doesn’t see. They only see the final answer, which will be something like “The last closing price of Apple Stock was $150.24”
I suggest not over-thinking this and start to try this in code.
In answer to your original question, this is a pretty tried and tested approach and it’s fairly straightforward and not particularly challenging.
I suspect the only challenging part will be when your list of functions exceeds the practical attention span of the LLM, but that will be addressed to a great extent by improvements to the models over time.
Another consideration, I suppose is the possibility that some user will spam queries that will call your potentially costed/costly API call. You will need some mechanism to limit this if your query interface is going to be public. I address this with a trust and quota system. If the only users of your system are trusted staff, then this is unlikely to be an issue.
At times like these it’s easier to try it out, experiment, see how it works, and build and iterate from there. Don’t get caught in analysis paralysis. Get a feel for how it works, see how the model responds, and then decide how to create a more sustainable flow. Testing out tools cost literally pennies at best.
I am not a coder really haha, just learning javascript few days ago.
I was just able to get Json response back from the AI.
The problem at the moment is that I always have to say ‘Return Json object’ when I send a message as the user from frontend interface, although I have already described it in the system message that ‘You are an assistant designed to return a JSON object’
do I need to make a function that always inject the json object phrase as a string after the user’s message?
for records, I am using the chat completions, not the assistant API.
In production, you can’t be calling Open AI from the client as that will mean you are having to include your secret key in the javascript code which is an absolute no-no.
So make sure you implement your final code on a server. Node is fine, but I’d argue you might want to consider Ruby or Python at this point.
Ruby is particularly friendly and very like javascript in some ways. Ruby on Rails server framework is a little bit more involved to learn, but an industry standard.
Ruby has a really nice API gem:
No, the LLM decides this based on your packaged function definitions, you do NOT need to ask it.