GPT API Failed to create completion as the model generated invalid Unicode output

marovich_albertha · March 24, 2024, 2:29pm

I got very strange Open AI message that I can not find anywhere on the internet? Here is what I got when tried to call API>

Something went wrong when call GPT. Detail Error(code=invalid_model_output, message=Failed to create completion as the model generated invalid Unicode output. Unfortunately, this can happen in rare situations. Consider reviewing your prompt or reducing the temperature of your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID req_a90b4e95a18fcab8e3d188e5d11b919f in your message.

Could any developers hep me get rid of this?

_j · March 24, 2024, 3:09pm

The OpenAI tokenizer produces bytes, and has some autonomy to produce novel UTF-8 Unicode character encodings, that may range from one to four bytes in length. There are about a million valid Unicode code points, and over 100k actual characters.

That also means that there are about 16 million invalid byte sequences once upper unicode beyond one byte is triggered.

The AI output something that couldn’t be decoded. This is a model problem, or a problem with the inputs, or as the error message guides you, the random chance of an unlikely and invalid output being predicted and used.

If you can reproduce the error deterministically at top_p=0.0001, that would be even more remarkable than a random happenstance output.

I would guess that this is most possible when using gpt-4-1106-preview or vision, as those models were trained with bad unicode when employing functions.

marovich_albertha · March 25, 2024, 4:46am

Oh, I see. The weird thing is sometimes it reports an error, sometimes it does not.
I ask GPT to extract body content from a markdown. The content language is Vietnamese, do you have any solutions to detect where it’s getting wrong?

Update 1:

I found the pretty fun workaround solution. I change model from gpt-3.5-turbo to gpt-3.5-turbo-16k. Then it’s seem working well.

iizukanao · April 1, 2024, 4:09pm

Since gpt-3.5-turbo-1106, I am also experiencing a similar Unicode problem with parsing Japanese strings containing Kanji characters. In some cases, the second and subsequent bytes of UTF-8 characters are unintentionally altered when the prompt is longer. For example, a situation where the character 田 (e7 94 b0) is expected, 町 (e7 94 ba) is returned, resulting in a strange response as Japanese. This was happening with gpt-3.5-turbo-0125 and gpt-3.5-turbo-1106, but not with gpt-3.5-turbo-16k.

In my case, including the following statement in the prompt has alleviated this phenomenon, but has not eliminated it completely.

[placeholder] are written in Japanese, and byte sequence of each UTF-8 multibyte character must be retained and not modified.

It is a good thing that I don’t have to determine the error myself, since the API has very recently started returning error responses instead of wrong characters.

Topic		Replies	Views
Support of unicode in gpt4-1106-preview Bugs gpt-4 , api	10	2197	November 15, 2024
Models sometimes return gibberish API	2	1561	December 17, 2023
Failed to generate output due to special tokens in the input Bugs gpt-35-turbo	2	1061	February 10, 2024
When I use the latest 'gpt-4-1106-preview' model, the model generates functions or tool parameters. If there is Chinese, there may be garbled Chinese characters. What is the reason for this? Bugs gpt-4	3	1769	December 6, 2023
Error: you requested 7409 tokens (1345 in the messages, 64 in the functions, and "6000" in the completion) API gpt-35-turbo	7	784	June 20, 2023

GPT API Failed to create completion as the model generated invalid Unicode output

Related topics