Character encoding (black diamond output)

rankoneads · March 25, 2023, 2:23am

Does anyone know how to make OpenAI output in the proper encoding? I get the black diamonds in a lot of my non-English outputs and I can’t figure out why.

For example,
platnosnos��

What I’ve tried:

Adding charset=UTF-8 in the CURL header. Content-Type: application/json; charset=UTF-8
Adding the proper encoding to the prompt(s), i.e. at the end “Character encoding: X”.

I check console and the outputs with black diamonds are coming directly from OpenAI (i.e. it’s not my website/database that’s the problem, I have the proper encoding in the META).

Has anyone figured out a way to solve this?

PaulBellow · March 25, 2023, 5:21am

The models aren’t primarily trained on other languages, but they do okay occasionally.

I’ve not seen black diamonds for missing, though. What system are you on? What’s the prompt look like? Maybe OpenAI is sending right character’s and you can’t seen them on your system?

rankoneads · March 26, 2023, 7:49am

I’d say the models perform quite well in other languages, not just occasionally. The translations/outputs are almost as good, if not as good, as Google translate (using davinci-003).

I’m pulling the black diamonds directly from console, so this is what I’m receiving directly from OpenAI.

For example:
je dôle��ité nájsť takého poskytovateĺa

You can tell it to write anything in any language and eventually, you’ll get this type of response.

PaulBellow · March 26, 2023, 8:30am

What’s your temp and frequency_penalty? Which model?

rankoneads · March 26, 2023, 9:23am

Davinci-003
Temp: 0.8
FP/PP: 0.5

rankoneads · April 6, 2023, 10:31am

I have found a correlation in this happening and letters like this:

ž
ň

Letters that have a “v” shape over them seem to sometimes not be encoded properly by the AI. I’ve tested 50+ examples, and 75% of the time it’s a letter like that. The other 25% is just OpenAI making a complete less out of the text.

Topic		Replies	Views
Weird characters like Ø±Ð´Ñ in ouput when doing translation API	5	1568	December 24, 2023
Proper character set for translations into french and spanish? API api	2	458	December 24, 2023
Mangled enDashes and emDashes receivied via API API	8	1057	December 18, 2023
Strange characters in response API	2	970	December 24, 2023
Asking for Spanish text gpt-3.5-turbo-1106 sends back weird symbols API	1	1094	December 4, 2023

Character encoding (black diamond output)

Related topics