In the current syntax for generating model response, we use the chat.completions.create method and send in our input through messages, functions and other arguments in a structured format.
My understanding is that all of this structured input gets converted into a single string input which is then passed to the LLM model for prediction. How can you access the final string input that gets passed to the LLM model for prediction?
So, if everything goes according to plan, it should be sending a serialized json string to the API call.
May I ask why you need the final input? Oftentimes, you do stuff before you serialize it, and you typically donāt mess with the serialization afterwards. I thought the client calls handle this for you automatically, making this question rather unnecessary.
You cannot access that āfinal stringā, as it is created by the internals of the API endpoint after you pass a list of messages with role and content within an API request with the other parameters.
You can get a hint of the special tokens that are used to enclosed messages in the actual input to the AI model context by looking at the GPT-4 template here.
As others have said, you canāt access the final input string but you can get the total number of tokens it consumed by looking at the response object.
This was more for my understanding of how the LLM handles such inputs, specifically the function calls.
Im assuming the models with function call capability were finetuned on a very specific prompt template to handle the function calls and any different prompt structure would probably not give the same level of results.
I dont know if this info was shared elsewhere through some paper. Or if people have been able to finetune models with the ability to work with function calls.
Thanks. This was pointed out recently to me as well. For chat.completions.create, including additional arguments like functions, tools etc increases the number of prompt tokens. So these arguments are definitely getting converted into string as part of the prompt
Early on someone got the model to dump its internal prompt for functions but theyāve since plugged that hole and the last time I searched for the dumped inner prompt text I couldnāt find it.
Itās not rocket science what theyāre doing though. They basically show the model a trimmed down version of the JSON schema you passed in
There is no āholeā to be patched. The AI will be made to do whatever a determined person wants, including repeating anything in context.
Hereās that function language as it is received by the AI, as part of the first system part message.
Parallel tool calls are by an additional tool called multi_tool_use, that wastes more tokens, and tells the AI to place multiple functions in the tool wrapper, and later models are trained to use these.