GPT-4 ask for tool calls but never tells which one

Hi,

I’m using OpenAI API for Chat Completion and Function call.

Using the OpenAI Python SDK (> 1.X), I request a Chat completion and pass tools to the call.
Im in Streamed mode.

The script (sanitised) :

response = client.chat.completions.create(
    model="gpt-4-1106-preview",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

for chunk in response:
    print(chunk)

Here is the output :

ChatCompletionChunk(id='', choices=[], created=0, model='', object='', system_fingerprint=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-8fVOkHKSPBG1RgDbHq4y2v5TF2Cex', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='tool_calls', index=0, logprobs=None, content_filter_results={})], created=1704902834, model='gpt-4', object='chat.completion.chunk', system_fingerprint=None, usage=None)

Why does in the last chunk, the finish_reason is tool_calls ? The API never returned me any tool_calls in the chunks. The content doesn’t looks to be filtered either.
In this case, the model expect me to call a tool but never tels me which one ?
Also, this is not consistent, with the same liste of messages sometimes, I get a completion or a valid tool call.

What are your taught on this issue ?

4 Likes

Im having the same issue:

chat_completion: {
  "id": "chatcmpl-8fbCgAWrLQHtvXAe9FmiNQ2Rg2K9U",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": null,
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {}
    }
  ],
  "created": 1704925150,
  "model": "gpt-4",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 136,
    "prompt_tokens": 91,
    "total_tokens": 227
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

I think this means its trying to call a function but tool_calls is None

I’m experiencing the same thing with Azure OpenAI for gpt-4 1106-Preview and gpt-35-turbo 1106 using the Azure.AI.OpenAI package in C#. It suddenly started happening yesterday.

The behaviour is consistent for all tool_calls responses. stop responses work as normal.

1 Like

This is happening on Azure to me as well, it broke yesterday. Consider this simple Chat completion request:

{
   "tool_choice":"auto"
  ,"tools":[
      {
        "type":"function"
      ,"function": {
        "name":"659eb9def985b66941c2c3c3",
        "description": "Tool to answer question 659eb9def985b66941c2c3c3",
        "parameters": {
            "properties":{
                "659eb9def985b66941c2c3c3":{
                    "type": "object",
                    "properties":{ 
                        "answer": {"type": "string"}
                      }
                    }
            }
           ,"required":["659eb9def985b66941c2c3c3"]
           ,"title":"Answers"
           ,"type":"object"
           }
      }
    }
  ]
  ,"temperature":0
  ,"stream":false
  ,"response_format":{"type":"json_object"}
  ,"model":"gpt-4"
  ,"messages":[
      {"content":"Use all available tools, reply in JSON.","role":"system"}
    ,{
        "content":"Answer question '659eb9def985b66941c2c3c3': Who is Leo Fender?"
      ,"role":"user"
    }
  ]
}

Using api version 2023-12-01-preview, this yields:

{
    "id": "chatcmpl-...",
    "object": "chat.completion",
    "created": 1704967194,
    "model": "gpt-4",
    "prompt_filter_results": [
        {
            "prompt_index": 0,
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "choices": [
        {
            "index": 0,
            "finish_reason": "tool_calls",
            "message": {
                "role": "assistant"
            },
            "content_filter_results": {}
        }
    ],
    "usage": {
        "prompt_tokens": 126,
        "completion_tokens": 127,
        "total_tokens": 253
    }
}

Unless I’m being completely dumb, the request looks fine to me, and requests like these used to work just fine before yesterday. Note that if I replace the above with the old function_calls, it works as intended:

{
   "function_call": {"name": "659eb9def985b66941c2c3c3" }
  ,"functions": [{
        "name":"659eb9def985b66941c2c3c3",
        "description": "Function for question 659eb9def985b66941c2c3c3",
        "parameters": {
            "properties":{
                "659eb9def985b66941c2c3c3":{
                    "type": "object",
                    "properties":{ 
                        "answer": {"type": "string"}
                      }
                    }
            }
           ,"required":["659eb9def985b66941c2c3c3"]
           ,"type":"object"
           }
      }
  ]
  ,"temperature":0
  ,"stream":false
  ,"response_format":{"type":"json_object"}
  ,"model":"gpt-4"
  ,"messages":[
      {"content":"Use function '659eb9def985b66941c2c3c3', reply in JSON.","role":"system"}
    ,{
        "content":"Who is Leo Fender?"
      ,"role":"user"
    }
  ]
}

Yielding:

{
    "id": "chatcmpl-...",
    "object": "chat.completion",
    "created": 1704967409,
    "model": "gpt-4",
    "prompt_filter_results": [
        {
            "prompt_index": 0,
            "content_filter_results": {
                "hate": {
                    "filtered": false,
                    "severity": "safe"
                },
                "self_harm": {
                    "filtered": false,
                    "severity": "safe"
                },
                "sexual": {
                    "filtered": false,
                    "severity": "safe"
                },
                "violence": {
                    "filtered": false,
                    "severity": "safe"
                }
            }
        }
    ],
    "choices": [
        {
            "index": 0,
            "finish_reason": "stop",
            "message": {
                "role": "assistant",
                "function_call": {
                    "name": "659eb9def985b66941c2c3c3",
                    "arguments": "{\"659eb9def985b66941c2c3c3\":{\"answer\":\"Leo Fender was an American inventor and entrepreneur who founded Fender Electric Instrument Manufacturing Company, now known as Fender Musical Instruments Corporation. He is widely recognized for creating some of the most iconic electric guitars and amplifiers, including the Telecaster and Stratocaster guitars, and the Bassman amplifier. His innovations significantly influenced the development of music, particularly rock and roll, and his instruments continue to be highly regarded by musicians worldwide.\"}}"
                }
            },
            "content_filter_results": {}
        }
    ],
    "usage": {
        "prompt_tokens": 138,
        "completion_tokens": 105,
        "total_tokens": 243
    }
}

From some comments I have read somewhere, I have the suspect this is a deliberate choice from OpenAI, because tools are not meant to be used this way anymore but rather via their (now BETA) Assistants and Threads API.

It would be nice to get some semi-official confirmation of this, because currently I switched my software away from the deprecated function calls thinking it was a no brainer, but it seems like the behaviour is starting to diverge?

It’s a bit of upsetting situation, because from one side we have some deprecated fields (that works as intended), on the other some BETA APIs which are not yet meant for production and in the middle a grey area of things that currently … do not work :slight_smile:

2 Likes

Hello,

I have tested with the model gpt-4 in version 1106-preview with the 2023-12-01-preview API version in multiple regions.

It is not working in :

  • France Central
  • UK South

It is working in :

  • Norway East
  • Sweden Central
1 Like

I confirm that on Sweden things are working as expected (that’s the first environment I had at hand I could test this with)!

What’s going on here? Why it wouldn’t work across all regions?

I can confirm this to be true using Azure OpenAI Gpt-4 1106, 2023-12-01-preview