Inconsistent tool calling on GPT-4o & GPT-4.1

Hi,

I have question about tool calling and consistency when using different models. I noticed some models like GPT-4o almost always call the provided (single) tool multiple times depending on the user prompt and other models like GPT-4.1 or GPT-5 Mini often times calls it once, but depending on the user/system prompt, it can call the same tool twice in the same response.. Also tried parallel_tool_calls but this flag then unfairly prevents calls for other legitimate functions..

Some examples using GPT-4o
Notice even though the tool properties are informed as an array GPT-4o decides to make two separate function call to the same function. My expectation was to receive a single function tool call with all items in the arguments.

Request:

{
  "model": "gpt-4o",
  "temperature": 0,
  "store": false,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful general purpose assistant. Use the jiraSearchTool to provide comprehensive answers."
    },
    {
      "role": "user",
      "content": "I want to learn about TEST-123 and I have some questions about TEST-456"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "jiraSearchTool",
        "description": "Performs search through Atlassian Jira API to retrieve information for the provided Jira issue IDs.",
        "parameters": {
          "$schema": "https://json-schema.org/draft/2020-12/schema",
          "type": "object",
          "properties": {
            "jiraIssueIds": {
              "description": "List of Jira issue IDs provided in the user prompt",
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": [
            "jiraIssueIds"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  ],
  "tool_choice": "auto"
}

Response:

{
  "id": "chatcmpl-Cbnj4tXGBL99YQN0y9ABgPxdFRKEl",
  "object": "chat.completion",
  "created": 1763125318,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_qNNJA3iqSIO89p7XADV9c90o",
            "type": "function",
            "function": {
              "name": "jiraSearchTool",
              "arguments": "{\"jiraIssueIds\": [\"TEST-123\"]}"
            }
          },
          {
            "id": "call_q1cjrAfIMCqtEup5r0U80ffi",
            "type": "function",
            "function": {
              "name": "jiraSearchTool",
              "arguments": "{\"jiraIssueIds\": [\"TEST-456\"]}"
            }
          }
        ],
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ]
}

This might be expected in some cases, but it can quickly get tricky on subsequent requests the the order of messages from assistant > tool > assistant > tool > needs to be preserved by putting messages with correct tool id one after another..

Here is another response, sometimes model decides to make a single call. This is what I expect, but GPT-4o is quite stubborn and does not consistently return a single tool call, even when add new instructions to make only one tool call for any given function and consolidate arguments in that..

{
  "id": "chatcmpl-CbnqhxrMJF4IyMQIOOxCrFe2Qe1O8",
  "object": "chat.completion",
  "created": 1763125791,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_WH0SX8p8Ta4nAIRZttzb64Hs",
            "type": "function",
            "function": {
              "name": "jiraSearchTool",
              "arguments": "{\"jiraIssueIds\": [\"TEST-123\", \"TEST-456\"]}"
            }
          }
        ],
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 122,
    "completion_tokens": 41,
    "total_tokens": 163,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_b1442291a8"
}

Now trying GPT-4.1

Request: I had to still instruct to make single tool call for any given function.

{
  "model": "gpt-4.1",
  "temperature": 0,
  "store": false,
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful general purpose assistant. Use the jiraSearchTool to provide comprehensive answers. You have to make only one tool call for any given function and consolidate arguments in that call."
    },
    {
      "role": "user",
      "content": "I want to learn about the Jira issue TEST-123 and I have some questions about TEST-456"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "jiraSearchTool",
        "description": "Performs search through Atlassian Jira API to retrieve information for the provided Jira issue IDs.",
        "parameters": {
          "$schema": "https://json-schema.org/draft/2020-12/schema",
          "type": "object",
          "properties": {
            "jiraIssueIds": {
              "description": "List of Jira issue IDs provided in the user prompt",
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          },
          "required": [
            "jiraIssueIds"
          ],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  ],
  "tool_choice": "auto"
}

Response:

{
  "id": "chatcmpl-Cbo0K2F8ZsmpW9i9JuFFM899wUU9N",
  "object": "chat.completion",
  "created": 1763126388,
  "model": "gpt-4.1-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_CIemwNhEPDlXMb6NiafCFuIt",
            "type": "function",
            "function": {
              "name": "jiraSearchTool",
              "arguments": "{\"jiraIssueIds\":[\"TEST-123\",\"TEST-456\"]}"
            }
          }
        ],
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "tool_calls"
    }
  ]
}

My questions are:

_ What do you think of this situation?

_ What is your go-to model when making tool calls?

_ Is this a prompting issue and should I enforce/instruct more in the system prompt?

_ Does it make sense to use another, more consistent model to make the tool decisions and then use any other model to produce the final answer?