Gpt-4-1106-preview messes up function call parameters encoding

Hello guys does using functions have any limits? Or any side effects?

Hello guys,

The bug is still there. Similar issue for spanish language. Any solution is in sight?

Hi there,
Same bug for French language (which is full of accentuated characters).

Just to add some feedback: the model rarely send us the good ascii data (not a valid representation of the underlying data): we often get “\n” or even “\ufffd” - which is the replacement char (ïżœ) in ascii - instead of accentuated chars. So json handling librairies do not change anything.

Hope we get a fix soon!
Thanks

Thank you for all the additional details everyone shared regarding this issue.

We’ve rolled out a fix that addresses escaped unicode characters, to bring output formatting more in-line with our previous models.

However, we understand that there are additional issues raised in this discussion regarding incorrect unicode characters or invalid unicode characters being produced by the model. Unfortunately, we do not anticipate this new fix to fully address these issues. We are still working on it, alongside other model improvements, but can’t provide a timeline at this time.

Same here. I am using function call api with model gpt-3.5-turbo-1106. And the result contains a lot corrupted characters.

@enoch Thank you for your efforts. However, you have fundamentally failed to address the actual issue at hand, for the following reasons:

The central concern of everyone in this thread has always been the issue of incorrect Unicode characters being produced. In reality, nobody gets stuck just because of Unicode escape decoding. Any JSON parsing library can handle the \uXXXX escape sequences without any problem. It seems your fix merely adds a middleware layer that transforms the GPT model’s output into raw UTF-8 format instead of the \uXXXX format, which doesn’t solve the core issue at all. We could do this translation step on our own. What matters is that the characters output by the model haven’t changed.

Indeed, my previous post includes a complete script to reproduce the issue, and the results of running it now are the same as before, replete with erroneous outputs. If this issue remains unsolved, the function/tool calling feature is virtually unusable for non-English languages that include non-ASCII Unicode characters. This has been the same issue from the beginning and the one that people care about, not two separate problems of which you’ve only addressed one.

This is a serious issue, and it deserves your full attention. I suggest that the banner alerting users to this bug be reinstated on the function calling documentation page to ensure that people are aware there’s still an unignorable issue at hand, and then work to resolve it as promptly as possible.

The fix seems to make it worse, 1106 model now produces non-utf8 output, such as â€œĂÂœĂÂ”ĂÂ¶â€Šâ€. The issues started around 18:00 UTC 12/1.

+1, it was messing up utf8 output on emojis every now and again but since yesterday its producing stuff like this consistently ð\x9F\x94\x84.

Now it is even worse! Now both gpt-3.5-turbo-1106 and gpt-4-1106-preview models generate incorrect results.

Chiming in to report that I’m also experiencing server-side encoding issues which started recently (less than 7 days ago) on gpt-3.5-turbo-1106 with the function tool.

I’m using the chat completion endpoint to ask the model to call a function with a single value argument from an enum. One of the values in the enum is “ProcĂ©dure”, note the accent on the â€œĂ©â€.

Debug logs confirm that the request JSON is properly formatted.

The model replies with a function call with the following argument: “Procédure”. Note that the â€œĂ©â€ has been mangled and is now expressed as two consecutive unicode characters “Ô and “©”.

This behavior is consistent with the â€œĂ©â€ character, expressed as 2 bytes in UTF-8, namely 0xC3 and 0xA9, being interpreted as ASCII by the model, as these two bytes correspond to “Ô and “©” in the extended ASCII table (ISO 8859-1)

Hi, as @juria rightly pointed out, the problem for at least 2 days (2 days ago my project stopped working properly) also affects the gpt-3.5-turbo-1106 model with the function tool. Since then, 70-80% of queries end up with a 500 error (and what’s worse, charging a query fee nonetheless. I don’t know if this is due to encoding, so I mention it as a curiosity). From my perspective, using functions in this model has lost its meaning (at least for non-English languages, although note that emoticons also encode badly). I created a simple function in python that corrects incorrect returned values, maybe it will be useful to someone, until OpenAI fixes the problem:

def fix_bad_encoding(text: str) -> str:
    words = text.split(' ')
    for i in range(len(words)):
        while True:
            try:
                new_word = words[i].encode('latin1').decode('utf8')
                if new_word == words[i]:
                    break
                words[i] = new_word
            except UnicodeError:
                break
    return ' '.join(words)

# [...]
function_args = json.loads(fix_bad_encoding(tool_call.function.arguments))

For “Procédure” returns “ProcĂ©dure”,
for “CzyĂ…ÂŒbyś planował podjąć takie działania?” (example taken from the returned function argument) returns “CzyĆŒbyƛ planowaƂ podjąć takie dziaƂania?”.

Some simplified information: I don’t know for which languages this is able to correct the problem. The function divides and iterates over words (separated by a space) because, some words are correct and such sentences triggered an error in my case. Also, one word has an operation in the loop because it happened that the word was “double wrongly encoded” and a single operation didn’t give a positive result.

I hope the latest defective update will be fixed soon.

Same observation here for any -1106 model; French chars with diacritics end up corrupted by the model : bÅuf / pérégrination / désaccord.

Among the fix beeing a total fail and the warning removed without any decent reason, it’s now worse than before.

That situation is severely frightening about the issue-fixing process in there. Did you actually ship the fix without any test ?! be it manual or automated ?

I mean, in the current situation I’m not even able to make the model produce ANY non-corrupted char with diacritics; not like it would be hard to reproduce.

“Please, tell me a random word in French with an accent in it”

Oo

I experienced the same issue with the function-calling tools.

I defined function arguments like :

{
    "name":"ScalpProperties",
    "description":"Propriétés du cuir chevelu",
    "parameters":{
        "type":"object",
        "properties":{
            "cuir_chevelu":{
                "type":"array",
                "items":{
                    "type":"string",
                    "enum":[
                        "normal",
                        "irrité"
                    ]
                }
            }
        }
    }
}

Ant what I got:

{"cuir_chevelu": "irrité"}

The fix indeed made it worse, and impossible for us here in the nordics to use the function calls.

Has it been fixed? gpt-3.5-turbo-1106

This is a complete mess! We have to shut down our complete service if this can’t be fixed soon. If possible, do a roll back of that previous bug fix.

Same issues here, now also with gpt-3.5-turbo-1106, not only gpt-4-1106-preview.

Hope this isn’t the model itself (shouldn’t be since it works fine without function calling).

We need a real fix here, impossible to do ourselves!

I am also waiting for them to fix this error. It makes it very difficult to move software to the production level.

Same here :frowning:
I use the regular chat API can someone tell if the problem occurs using the assistant API?

I understand OpenAI is a lab but still :stuck_out_tongue: