Gpt-4-1106-preview messes up function call parameters encoding

If I request a function to be called, gpt-4-1106-preview calls the function with parameters that have wrong UTF-8 encoding for non-ascii characters (international alphabets). Sometimes the characters are left out, sometimes they are just random and sometimes it uses HTML encoding.

The same call works perfectly fine with gpt-3.5-turbo and gpt-4.

P.S. I actually reported this twice, the first post includes a sample request and response, but was marked as SPAM (???)

22 Likes

I’m also encountering a similar problem with gpt-4-1106-preview. Whenever I receive responses that include non-ASCII characters, they’re being substituted with ASCII equivalents (e.g. ü => u) or are appearing as the � symbol.

With gpt-3.5-turbo-1106, occasionally non-ascii characters are replaced by their unicode encoded versions (e.g. ç => \u00e7)

Thank you for reporting! We can repro on 1106 models and are actively investigating. Will post here once it’s fixed.

10 Likes

Is there any update regarding this? It appears, despite the recent update to the new models, the issue persists. Any insights or updates on this would be greatly appreciated. Thank you!

Hi!

We have rolled out a fix to this today! Most forms of this issue should be addressed by this.

Please let us know if the issue persists for you.

Apologies, there are some additional issues we are currently looking into regarding this behavior. We’ll keep this thread updated.

3 Likes

Im experiencing this same issue.

With gpt-3.5-turbo-1106: {‘classifications’: ‘Peti{\u00e7}\’}
With gpt-4-1106-preview: {‘classifications’: ‘Peti\u00e7\u00e3o Inicial’}
With gpt-3.5-turbo-0613 (legacy): {‘classifications’: [‘Petição Inicial’]}

The only correct one is the legacy 0613 model.
Not only is it messing the unicode characters it also doesn’t seem to be properly recognizing my function call, as it should be an array of strings and not just a single string as gpt-3.5 turbo 0613 correctly recognizes.

2 Likes

Hi do we have any update about this? im getting response some time with HTML enconding and some times a response with this � symbol.

Same problem. Any solution?

ChatCompletion(id=‘’, choices=[Choice(finish_reason=‘stop’, index=0, message=ChatCompletionMessage(content=None, role=‘assistant’, function_call=FunctionCall(arguments=‘{“churn_signals”:[{“signal”:“Problemas t cnicos de acesso ao aplicativo e pagamentos”,“details”:“O cliente menciona que ofertou lances para seu cliente e que at agora ele n o recebeu nenhum SMS, indicando uma instabilidade no sistema.”}]}’, name=‘get_churn_signals’), tool_calls=None))], created=, model=‘gpt-4-1106-preview’, object=‘chat.completion’, system_fingerprint=‘’, usage=CompletionUsage(completion_tokens=63, prompt_tokens=2155, total_tokens=2218))

@enoch are there any updates on this issue? Last time you said something you mentioned there were some additional issues that needed to be fixed.
Are you guys still working through it?
We wanted to switch to the new/cheaper model but have been constrained because of this.

Hi, we have a fix that should be available mid next week, which we expect to address most instances of this issue.

However, note that we expect most JSON handling libraries to already automatically handle the problematic output, as it is technically a valid representation of the intended underlying data. Therefore, if you are experiencing issues with your application, there might be other factors at play, that we may not be able to address completely in the upcoming fix.

We have posted a small notice in our docs characterizing the known issue: https://platform.openai.com/docs/guides/function-calling

We appreciate your patience as we address this.

2 Likes

For any users who have switched from gpt-3.5-turbo to one of the newer models,
and are making use of the “json mode” the json output will typically be in markdown format and have top and tail markdown markers, i.e. ```json with a closing triple backtick at the end.

You can remove this with a regex parser like this :

response.content = response.content.replace(/```json\n?|```/g, '');

more safe for typescript :

response.content = response.content.replace(/```json\n?|```/g, '');

// Add regex test
const regex = /```json\n?|```/g;
if (regex.test(response.content)) {
  // Code block delimiters found in the response content
  // Perform additional actions if needed
}

2 Likes

At least in my case, the encoding itself is not the main issue.
The main issue is that it actually breaks when it hits the different encoding character and it seems to mess up the answer completely.

As I stated on my initial message these are the outputs of the 3 models:
With gpt-3.5-turbo-1106: {‘classifications’: ‘Peti{\u00e7}\’}
With gpt-4-1106-preview: {‘classifications’: ‘Peti\u00e7\u00e3o Inicial’}
With gpt-3.5-turbo-0613 (legacy): {‘classifications’: [‘Petição Inicial’]}

gpt-3.5-turbo-0613 and also the legacy gpt-4 get everything right, such as the correct type of the argument, while the new ones completely mess up not only the encoding but the argument of type of the argument too.
This is with testing the calls with the exact same inputs, only changing the model name.

I tried hard to replicate the bug (my original post) today and cannot replicate it anymore. So maybe it is finally fixed.

Edit: Not true, I accidentally used another model.

This problem isn’t fixed at all. I have a chatbot using OpenAI’s API that uses function calling to search through Google and Bing, heavily utilized by my Chinese-speaking friends on a daily basis. But this bug makes the bot’s search almost unusable.

It’s not about the hassle of decoding Unicode in JSON. The real issue is the GPT model not creating the correct Unicode escape sequence for less common Chinese characters, leading to completely wrong characters. Here’s a full demo that you can test to see for yourself.

Here is the reproduce example:

import openai
import os
import json

client = openai.OpenAI(api_key = os.getenv('OPENAI_API_KEY'))

models = [
    'gpt-3.5-turbo-16k',
    'gpt-4-1106-preview',
]
query = '邓紫棋'

for model in models:
    for i in range(10):
        result = client.chat.completions.create(
            model=model,
            messages=[
                {'role': 'system', 'content': 'You are a helpful assistant with searching capabilities'},
                {'role': 'user', 'content': f'Please search for "{query}"'}
            ],
            tools=[{
                "type": "function",
                "function": {
                    "name": "search",
                    "description": "Search on Google",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "The search query",
                            },
                        },
                        "required": ["query"],
                    }
                }
            }],
        )
        arguments = result.choices[0].message.tool_calls[0].function.arguments
        decoded_arguments = json.loads(arguments)
        print(model, i + 1, repr(arguments), '-->', decoded_arguments)

And this is what the program’s output looks like (note that the old model works fine, but the new model often spits out the wrong Chinese characters):

gpt-3.5-turbo-16k 1 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 2 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 3 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 4 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 5 '{\n"query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 6 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 7 '{\n"query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 8 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 9 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-3.5-turbo-16k 10 '{\n  "query": "邓紫棋"\n}' --> {'query': '邓紫棋'}
gpt-4-1106-preview 1 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 2 '{"query":"\\u9093\\u7d2b\\u68a8"}' --> {'query': '邓紫梨'}
gpt-4-1106-preview 3 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 4 '{"query":"\\u9093\\u7d2b\\u68cb"}' --> {'query': '邓紫棋'}
gpt-4-1106-preview 5 '{"query":"\\u90a2\\u7d2b\\u68a8"}' --> {'query': '邢紫梨'}
gpt-4-1106-preview 6 '{"query":"\\u9093\\u7d2b\\u68a8"}' --> {'query': '邓紫梨'}
gpt-4-1106-preview 7 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 8 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 9 '{"query":"\\u90a3\\u7d2b\\u68a8"}' --> {'query': '那紫梨'}
gpt-4-1106-preview 10 '{"query":"\\u90a2\\u7d2b\\u68cb"}' --> {'query': '邢紫棋'}

Another example:

gpt-3.5-turbo-16k 1 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 2 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 3 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 4 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 5 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 6 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 7 '{\n"query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 8 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 9 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-3.5-turbo-16k 10 '{\n  "query": "新型冠状病毒疫情"\n}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 1 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 2 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 3 '{"query":"\\u65b0\\u578b\\u51b7\\u51fb\\u75c5\\u6bdb\\u75ab\\u60c5"}' --> {'query': '新型冷击病毛疫情'}
gpt-4-1106-preview 4 '{"query":"\\u65b0\\u578b\\u519b\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型军状病毒疫情'}
gpt-4-1106-preview 5 '{"query":"\\u65b0\\u578b\\u5185\\u51b7\\u83ab\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型内冷莫病毒疫情'}
gpt-4-1106-preview 6 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 7 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 8 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 9 '{"query":"\\u65b0\\u578b\\u51a0\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型冠状病毒疫情'}
gpt-4-1106-preview 10 '{"query":"\\u65b0\\u578b\\u519b\\u72b6\\u75c5\\u6bd2\\u75ab\\u60c5"}' --> {'query': '新型军状病毒疫情'}

For those not familiar with Chinese, note that the first example is the name of a singer, and the second example is the Simplified Chinese term for COVID-19, which are very common words.

2 Likes

@zzh1996

instad of using package/lib openai in python

try another method
example:

@b0zal

I need the function calling feature, so I have to use the completion API only. This issue isn’t related to Python libraries. Json mode doesn’t fix it either. If you’re confident you have a solution, please modify the demo code I provided above to be bug-free.

My bad, I confirm the bug is still there. I accidentally left gpt-4 model in the test.

For the brazilians over here, something that worked for me is adding this to the prompt:
“- utilize utf-8 para caracteres especiais”

Use utf-8 for special characters.

3 Likes