How can I ensure every LLM reply includes exactly one message and one tool call?

I am using GPT-4.1 (gpt-4.1-2025-04-14) with tools, and would like to make sure that the output contains exactly a ResponseOutputMessage and a ResponseFunctionToolCall combined together, after every call to the LLM.

In some cases, I noticed that the LLM could output the tool call (json) alone without the message (in natural language), or vice-versa. The tool_choice parameter is always left to auto when running client.responses.create(...).

To solve this issue, I applied a hint suggested by GPT-4.1 itself : I explicitly gave at the end of my prompt the expected output structure and content.

**[Output Structure Example]:**
```python
response.output = [
ResponseOutputMessage(content="your explanation..."),
ResponseFunctionToolCall(name="tool_name", arguments={...}),
]```

Among those of you using tools, have you tried such a method? Or did you find a way to “force” the LLM to output both a message and a tool call combined together in another more canonical way?

If you expect every single output to include text and a tool call, you could probably replace that entire setup with json output. By passing a json_schema and setting strict to true you can force it to follow the format that you need. You’ll still need to account for refusals which are supplied in a refusal parameter. But I think that this is the approach you’re looking for.

As a disclaimer, this is all curl and I don’t know how to use the Python SDK.

That probably isn’t supposed to happen, because tool calls are meant to generate necessary context to formulate a response.

How can it answer “The weather in paris is X degrees” and at the same time return you a tool call for you to provide the temperature in paris?

Not having a tool call is possible as long as it doesn’t see a need of additional information to answer certain prompts.

How functions work

It’s intended behavior for models to output text before calling a function. Reasoning models especially like doing this in their chain-of-thought. Try it: ask o4-mini or o3 a question in ChatGPT that you know requires it to perform web searches. Then, check the summary at the top of its response.

Or, a simple playground example:


That said, I still think OP is looking for structured outputs. Would help to know the exact use case though.

Well, it’s not that you can’t.

I just don’t see much meaning in creating a response in between the tool call and it passing the function return.

But perhaps OP could share a bit more of his thoughts, is he just being playful or is there perhaps a pratical need?

@aprendendo.next @OnceAndTwice Thanks for your replies.
The task I want to conduct is generation of boolean queries. In that context, the message has the purpose to make visible the rationales behind the creation of a generated query. Then, the query is passed to a tool that will assess its quality.

If a reasoning model like o3 is not an option for you, you can add an additional argument to the tool definition asking for the reasoning. This way you can handle multiple function calls in a single request.

But the reasoning provided this way is pretty shallow, I’m not sure if it adds any real value.

Example
{
  "name": "check_weather",
  "description": "Checks the weather for a given location",
  "strict": true,
  "parameters": {
    "type": "object",
    "required": [
      "reason",
      "location"
    ],
    "properties": {
      "reason": {
        "type": "string",
        "description": "The reasoning for checking the weather"
      },
      "location": {
        "type": "string",
        "description": "The location to check the weather for"
      }
    },
    "additionalProperties": false
  }
}

You gave what YOU expect, not what the AI DOES to decide whether to emit to a tool recipient internal name or decide to close the internal assistant prompt without a function address method and create an output for the user.

The “message” before a function call thus requires an “auto” tool choice so there is decision ability, and simply your communication with the AI, often against its post-training tuning to directly emit to any useful function before producing text.

You need to have the AI produce the text for the user before the output. The function definition is the best place to describe this need and to have the AI announce its intention and reasoning behind then calling a function after normal output.


I’ll let you refine for yourself how to produce AI applications this fluent:

If you don’t need the actual reasoning that was used, you can use o3 or o4-mini which will perform reasoning in the background before coming to its answer. Reasoning models are a popular choice in classifiers that need high accuracy.

Otherwise, you can use a structured output to force the model to provide its reasoning first, and then its verdict in the next object in its output.

System message

Determine if the provided word has a positive, neutral, or negative connotation. Use chain-of-thought reasoning to come to your conclusion.

JSON schema
{
  "name": "word_classification",
  "strict": true,
  "schema": {
    "type": "object",
    "properties": {
      "reasoning": {
        "type": "string",
        "description": "The reasoning behind the classification of the word."
      },
      "verdict": {
        "type": "string",
        "description": "The classification of the word, which can be positive, neutral, or negative.",
        "enum": [
          "positive",
          "neutral",
          "negative"
        ]
      }
    },
    "required": [
      "reasoning",
      "verdict"
    ],
    "additionalProperties": false
  }
}


Judging by the replies above me, there seems to still be some confusion as to what exactly you’re building. It looks to me like this is for a classifier, and if that’s the case, function calling is a distraction that will leave you without the guarantees you need. Specifically, you won’t be able to force a model to emit text prior to a function call.