Getting a function call + textual response in the same call

High Level Problem - Getting a text message in “Content” and the result of function calling isn’t possible in a single api call for some reason (example at the bottom of the body)

Use Case:
I am using open ai to navigate a website using tools like click/scroll/type

Present State:
My user prompt has the below:

  1. The annotated element dictionary for the web page (each element is numbered) - I also provide this annotated image as well because I use gpt-4o
  2. The schemas of my tools
  3. The task I am trying to achieve (like go into the Help Section of the Website)

The functions param is left empty, so I am not really using function calling in it’s true sense

I am asking the llm to give me 2 things…

  1. Thought (For example - I need to find the help section on this web page)
  2. The function to be used (click/type) and it’s args

Why is thought important for me ?
I need to capture this thought in my chat history (rather it will go in the user prompt for the next iteration for llm to figure out the next steps after my functions do the first iteration steps given by the llm… The llm needs to be told what it thought in the previous instance so that it does not repeat the step/mistake again

In this present state, it’s failing some times and it makes me go in the true function calling direction…

I know how/where to attach the tool/function schemas for function calling…

So I can get the Part 2 of my required response (function and it’s args)
Getting the “Thought” itself seems difficult/impossible though

Now that you understood why “Thought” is important, I am asking how to get “Thought” in the Content part of my response and function+args in the function_call part of my response in the same llm call. I can’t afford to 2 calls due to the scale of operations.

I have already tried a very simple version dummy example like below and the concurrent content and function call I am asking for is not happening :
Create a get_current_weather function
Have the prompt as : - “What’s the weather like in Boston? Also tell a joke”

I am only getting the function call, but I was also expecting a joke

My dummy code that doesn’t achieve my ask

import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

import json

def get_current_weather(location, unit="fahrenheit"):
    """Get the current weather in a given location"""
    weather_info = {
        "location": location,
        "temperature": "72",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

# define a function
functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
]


messages = [
    {
        "role": "user",
        "content": "What's the weather like in Boston? also tell me a joke"
    }
]

import openai
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=messages,
    functions=functions
)

response_message = response["choices"][0]["message"]
print(response_message["content"]) #there will be no content at all, but i expected a joke
print(response_message["function_call"])  #correct function call

I don’t believe it is possible to handle both in one call.

You must keep a visible chat history and a inner thought history (for current response) you can send to the LLM together whilst only showing the chat to the user.

You loop inner discussion with the bot whilst accumulating function results untill you have a response you can send to the user once all function calls and answers have been processed by the LLM.

You can look at my algorithm here as an example:

Prompt the model to provide both the thought and function call separately:Parse the response to extract the thought and the function call.

It’s by design IMO. The idea is that the LLM assumes it needs the results of one or more tools in order to generate its response.

There is a workaround: in the system prompt, instruct the LLM to explain what tools it will use and why before actually calling them.

It will then do exactly that and wait for the user confirmation before issuing the tool calls.

You might even instruct it to call them right away by adding a custom reserved word to the end of such messages, parse it and automatically reply with a confirmation message so it requires no user intervention.

You can also check Microsoft’s auotogen framework. It can be used for multi agent conversation, including mixing classic text response agents with tools only agents in the same conversation.

FYI I don’t know how Custom GPTs do it, but they can absolutely mix function calls and actual answers. It might use the trick above, or use a multi agent implémentation.

Beware of consequential function calls though. You might end up having the LLM unleashed on your tools and run a mock. It happened to me using a Custom GPT more than once. The LLM kept talking to itself for like 5 minutes straight, doing dozens of function calls without any user interaction.