Character encoding (black diamond output)

Does anyone know how to make OpenAI output in the proper encoding? I get the black diamonds in a lot of my non-English outputs and I can’t figure out why.

For example,
platnosnos��

What I’ve tried:

  1. Adding charset=UTF-8 in the CURL header. Content-Type: application/json; charset=UTF-8

  2. Adding the proper encoding to the prompt(s), i.e. at the end “Character encoding: X”.

I check console and the outputs with black diamonds are coming directly from OpenAI (i.e. it’s not my website/database that’s the problem, I have the proper encoding in the META).

Has anyone figured out a way to solve this?

The models aren’t primarily trained on other languages, but they do okay occasionally.

I’ve not seen black diamonds for missing, though. What system are you on? What’s the prompt look like? Maybe OpenAI is sending right character’s and you can’t seen them on your system?

I’d say the models perform quite well in other languages, not just occasionally. The translations/outputs are almost as good, if not as good, as Google translate (using davinci-003).

I’m pulling the black diamonds directly from console, so this is what I’m receiving directly from OpenAI.

For example:
je dôle��ité nájsť takého poskytovateĺa

You can tell it to write anything in any language and eventually, you’ll get this type of response.

1 Like

What’s your temp and frequency_penalty? Which model?

Davinci-003
Temp: 0.8
FP/PP: 0.5

I have found a correlation in this happening and letters like this:

ž
ň

Letters that have a “v” shape over them seem to sometimes not be encoded properly by the AI. I’ve tested 50+ examples, and 75% of the time it’s a letter like that. The other 25% is just OpenAI making a complete less out of the text.