Gpt-4-1106-preview messes up function call parameters encoding

lukasnevosad · November 7, 2023, 1:12pm

If I request a function to be called, gpt-4-1106-preview calls the function with parameters that have wrong UTF-8 encoding for non-ascii characters (international alphabets). Sometimes the characters are left out, sometimes they are just random and sometimes it uses HTML encoding.

The same call works perfectly fine with gpt-3.5-turbo and gpt-4.

P.S. I actually reported this twice, the first post includes a sample request and response, but was marked as SPAM (???)

onurmatik · November 8, 2023, 7:53am

I’m also encountering a similar problem with gpt-4-1106-preview. Whenever I receive responses that include non-ASCII characters, they’re being substituted with ASCII equivalents (e.g. ü => u) or are appearing as the � symbol.

With gpt-3.5-turbo-1106, occasionally non-ascii characters are replaced by their unicode encoded versions (e.g. ç => \u00e7)

atty-openai · November 8, 2023, 8:57am

Thank you for reporting! We can repro on 1106 models and are actively investigating. Will post here once it’s fixed.

onurmatik · November 15, 2023, 7:20pm

Is there any update regarding this? It appears, despite the recent update to the new models, the issue persists. Any insights or updates on this would be greatly appreciated. Thank you!

enoch · November 16, 2023, 5:33pm

Hi!

We have rolled out a fix to this today! Most forms of this issue should be addressed by this.

Please let us know if the issue persists for you.

enoch · November 16, 2023, 10:21pm

Apologies, there are some additional issues we are currently looking into regarding this behavior. We’ll keep this thread updated.

lvieira · November 17, 2023, 11:32am

Im experiencing this same issue.

With gpt-3.5-turbo-1106: {‘classifications’: ‘Peti{\u00e7}\’}
With gpt-4-1106-preview: {‘classifications’: ‘Peti\u00e7\u00e3o Inicial’}
With gpt-3.5-turbo-0613 (legacy): {‘classifications’: [‘Petição Inicial’]}

The only correct one is the legacy 0613 model.
Not only is it messing the unicode characters it also doesn’t seem to be properly recognizing my function call, as it should be an array of strings and not just a single string as gpt-3.5 turbo 0613 correctly recognizes.

josemgmz · November 19, 2023, 8:46pm

Hi do we have any update about this? im getting response some time with HTML enconding and some times a response with this � symbol.

heldersato · November 21, 2023, 3:13pm

Same problem. Any solution?

ChatCompletion(id=‘’, choices=[Choice(finish_reason=‘stop’, index=0, message=ChatCompletionMessage(content=None, role=‘assistant’, function_call=FunctionCall(arguments=‘{“churn_signals”:[{“signal”:“Problemas t cnicos de acesso ao aplicativo e pagamentos”,“details”:“O cliente menciona que ofertou lances para seu cliente e que at agora ele n o recebeu nenhum SMS, indicando uma instabilidade no sistema.”}]}’, name=‘get_churn_signals’), tool_calls=None))], created=, model=‘gpt-4-1106-preview’, object=‘chat.completion’, system_fingerprint=‘’, usage=CompletionUsage(completion_tokens=63, prompt_tokens=2155, total_tokens=2218))

lvieira · November 24, 2023, 12:21pm

@enoch are there any updates on this issue? Last time you said something you mentioned there were some additional issues that needed to be fixed.
Are you guys still working through it?
We wanted to switch to the new/cheaper model but have been constrained because of this.

enoch · November 26, 2023, 1:19am

Hi, we have a fix that should be available mid next week, which we expect to address most instances of this issue.

However, note that we expect most JSON handling libraries to already automatically handle the problematic output, as it is technically a valid representation of the intended underlying data. Therefore, if you are experiencing issues with your application, there might be other factors at play, that we may not be able to address completely in the upcoming fix.

We have posted a small notice in our docs characterizing the known issue: https://platform.openai.com/docs/guides/function-calling

We appreciate your patience as we address this.

Foxalabs · November 26, 2023, 1:24am

For any users who have switched from gpt-3.5-turbo to one of the newer models,
and are making use of the “json mode” the json output will typically be in markdown format and have top and tail markdown markers, i.e. ```json with a closing triple backtick at the end.

You can remove this with a regex parser like this :

response.content = response.content.replace(/```json\n?|```/g, '');

b0zal · November 26, 2023, 1:34am

more safe for typescript :

response.content = response.content.replace(/```json\n?|```/g, '');

// Add regex test
const regex = /```json\n?|```/g;
if (regex.test(response.content)) {
  // Code block delimiters found in the response content
  // Perform additional actions if needed
}

lvieira · November 26, 2023, 2:01am

At least in my case, the encoding itself is not the main issue.
The main issue is that it actually breaks when it hits the different encoding character and it seems to mess up the answer completely.

As I stated on my initial message these are the outputs of the 3 models:
With gpt-3.5-turbo-1106: {‘classifications’: ‘Peti{\u00e7}\’}
With gpt-4-1106-preview: {‘classifications’: ‘Peti\u00e7\u00e3o Inicial’}
With gpt-3.5-turbo-0613 (legacy): {‘classifications’: [‘Petição Inicial’]}

gpt-3.5-turbo-0613 and also the legacy gpt-4 get everything right, such as the correct type of the argument, while the new ones completely mess up not only the encoding but the argument of type of the argument too.
This is with testing the calls with the exact same inputs, only changing the model name.

lukasnevosad · November 26, 2023, 3:49pm

~~I tried hard to replicate the bug (my original post) today and cannot replicate it anymore. So maybe it is finally fixed.~~

Edit: Not true, I accidentally used another model.

zzh1996 · November 26, 2023, 7:05pm

This problem isn’t fixed at all. I have a chatbot using OpenAI’s API that uses function calling to search through Google and Bing, heavily utilized by my Chinese-speaking friends on a daily basis. But this bug makes the bot’s search almost unusable.

It’s not about the hassle of decoding Unicode in JSON. The real issue is the GPT model not creating the correct Unicode escape sequence for less common Chinese characters, leading to completely wrong characters. Here’s a full demo that you can test to see for yourself.

Here is the reproduce example:

import openai
import os
import json

client = openai.OpenAI(api_key = os.getenv('OPENAI_API_KEY'))

models = [
    'gpt-3.5-turbo-16k',
    'gpt-4-1106-preview',
]
query = '邓紫棋'

for model in models:
    for i in range(10):
        result = client.chat.completions.create(
            model=model,
            messages=[
                {'role': 'system', 'content': 'You are a helpful assistant with searching capabilities'},
                {'role': 'user', 'content': f'Please search for "{query}"'}
            ],
            tools=[{
                "type": "function",
                "function": {
                    "name": "search",
                    "description": "Search on Google",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "The search query",
                            },
                        },
                        "required": ["query"],
                    }
                }
            }],
        )
        arguments = result.choices[0].message.tool_calls[0].function.arguments
        decoded_arguments = json.loads(arguments)
        print(model, i + 1, repr(arguments), '-->', decoded_arguments)

And this is what the program’s output looks like (note that the old model works fine, but the new model often spits out the wrong Chinese characters):

gpt-3.5-turbo-16k 1 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 2 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 3 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 4 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 5 '{\n"query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 6 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 7 '{\n"query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 8 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 9 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 10 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-4-1106-preview 1 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 2 '{"query":"\\u9093\\u7d2b\\u68a8"}' --> {'query': '邓紫梨'}
gpt-4-1106-preview 3 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 4 '{"query":"\\u9093\\u7d2b\\u68cb"}' --> {'query': '邓紫棋'}
gpt-4-1106-preview 5 '{"query":"\\u90a2\\u7d2b\\u68a8"}' --> {'query': '邢紫梨'}
gpt-4-1106-preview 6 '{"query":"\\u9093\\u7d2b\\u68a8"}' --> {'query': '邓紫梨'}
gpt-4-1106-preview 7 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 8 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 9 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 10 '{"query":"\\u90a2\\u7d2b\\u68cb"}' --> {'query': '邢紫棋'}

Another example:

gpt-3.5-turbo-16k 1 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 2 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 3 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 4 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 5 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 6 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 7 '{\n"query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 8 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 9 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 10 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 1 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 2 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 3 '{"query":"\\u65b0\\u578b\\u51b7\\u51fb\\u75c5\\u6bdb\\u75ab\\u60c5"}' --> {'query': '新型冷击病毛疫情'}
gpt-4-1106-preview 4 '{"query":"\\u65b0\\u578b\\u519b\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型军状病毒疫情'}
gpt-4-1106-preview 5 '{"query":"\\u65b0\\u578b\\u5185\\u51b7\\u83ab\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型内冷莫病毒疫情'}
gpt-4-1106-preview 6 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 7 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 8 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 9 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 10 '{"query":"\\u65b0\\u578b\\u519b\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型军状病毒疫情'}

For those not familiar with Chinese, note that the first example is the name of a singer, and the second example is the Simplified Chinese term for COVID-19, which are very common words.

b0zal · November 27, 2023, 2:40am

@zzh1996

instad of using package/lib openai in python

try another method
example:

Use endpoint of api https://platform.openai.com/docs/guides/text-generation/chat-completions-api
Json Mode https://platform.openai.com/docs/guides/text-generation/json-mode
Other guides https://platform.openai.com/docs/guides/text-generation

zzh1996 · November 27, 2023, 3:44am

@b0zal

I need the function calling feature, so I have to use the completion API only. This issue isn’t related to Python libraries. Json mode doesn’t fix it either. If you’re confident you have a solution, please modify the demo code I provided above to be bug-free.

lukasnevosad · November 27, 2023, 7:37am

My bad, I confirm the bug is still there. I accidentally left gpt-4 model in the test.

fernando7 · November 28, 2023, 5:04pm

For the brazilians over here, something that worked for me is adding this to the prompt:
“- utilize utf-8 para caracteres especiais”

Use utf-8 for special characters.

Topic		Replies	Views
The GPT-4-1106-preview model keeps generating "\\n\\n\\n\\n\\n\\n\\n\\n" for an hour when using functions API chatgpt , api	9	2474	December 31, 2023
Bad results when using fine-tuned model with function calling API fine-tuning , function-calling , fine-tuning-problems	15	4493	November 23, 2023
Support of unicode in gpt4-1106-preview Bugs gpt-4 , api	10	2149	November 15, 2024
1106 tool use, function parameters encounter Chinese garbled characters API api	2	1219	December 2, 2023
Function calling looping uncontrollably and calling unnecessarily Bugs function-calling , gpt-4o , gpt-4o-mini	27	892	September 19, 2024

Gpt-4-1106-preview messes up function call parameters encoding

Related topics