API Response encoding Bug | UTF-8/UTF-16

There are already articles complaining about the problem that the API Response doesn’t display non-english character correctly:

The new Version “gpt-3.5-turbo-0125” should fix this bug as you can read in this article:

But even tho I use the new Version"0125" the problem is still the same, does anyone has a solution for that?

Thanks for your responses!

2 Likes

I can’t fix dummkopf functions, but I can offer a solution.

Bad:

“arguments”: { “name”: “Döner”, “instructions”: [ “Schneide das Fleisch in dünne Streifen.”, “Mariniere das Fleisch mit Gewürzen und Joghurt.”, “Brate das Fleisch in einer Pfanne oder auf dem Grill.”, “Schneide das Gemüse und bereite den Salat vor.”, “Fülle das Fleisch, Gemüse und Salat in das Fladenbrot.”, "Füge die Soà e hinzu und rolle das Fladenbrot

Füge die Soße hinzu und rolle das Fladenbrot

Fixed by code:


import re

def fix_encoding(s):
    # Define the mapping of incorrect two-character sequences to the correct characters
    correction_map = {
        'ö': 'ö',        'ü': 'ü',
        'ä': 'ä',        'ß': 'ß',
        'à ': 'ß',        'ü': 'ü',
        'ö': 'ö',        'ä': 'ä',
        'ß': 'ß',        'à': 'À',
        'Ã\x9f': 'ß', # where \x9f represents the invisible character
    }
    
    # Create a regular expression from the map
    regex = re.compile("(%s)" % "|".join(map(re.escape, correction_map.keys())))
    # For each match, look-up corresponding value in dictionary
    return regex.sub(lambda mo: correction_map[mo.string[mo.start():mo.end()]], s)

incorrect_strings = [
    '"arguments": { "name": "Döner", "instructions": [ "Schneide das Fleisch in dünne Streifen.", "Mariniere das Fleisch mit Gewürzen und Joghurt.", "Brate das Fleisch in einer Pfanne oder auf dem Grill.", "Schneide das Gemüse und bereite den Salat vor.", "Fülle das Fleisch, Gemüse und Salat in das Fladenbrot.", "Füge die Soà e hinzu und rolle das Fladenbrot"'
]
corrected_strings = [fix_encoding(s) for s in incorrect_strings]
print(corrected_strings)

output:

[‘“arguments”: { “name”: “Döner”, “instructions”: [ “Schneide das Fleisch in dünne Streifen.”, “Mariniere das Fleisch mit Gewürzen und Joghurt.”, “Brate das Fleisch in einer Pfanne oder auf dem Grill.”, “Schneide das Gemüse und bereite den Salat vor.”, “Fülle das Fleisch, Gemüse und Salat in das Fladenbrot.”, “Füge die So?e hinzu und rolle das Fladenbrot”’]

You’d have to get the bytes of “ß” Füge die Soße hinzu und rolle das Fladenbrot to ensure success, but there’s a guess in code.

In the other long thread, there might have been a more generic solution still applicable or modifiable.

2 Likes

Oh men thank you so much! I’ll try to figure out how to use this in Dart since I am using Dart/Flutter but that already helped a lot thank you!! :slight_smile:

Thank you!
Here is the list with some more characters if anyone needs it

corrections = {
        'Ã\xa0': 'à', 'è': 'è', 'é': 'é', 'ì': 'ì', 'ò': 'ò', 'ó': 'ó', 'ù': 'ù',
        'ä': 'ä', 'ö': 'ö', 'ü': 'ü', 'ß': 'ß',  
        'á': 'á', 'í': 'í', 'ñ': 'ñ', 'ú': 'ú',
        'â': 'â', 'ê': 'ê', 'ë': 'ë', 'î': 'î', 'ï': 'ï', 'ô': 'ô', 'û': 'û', 'ç': 'ç'
    }
1 Like

@eggers.mats Have you been using JSON mode? I’m experiencing the same with 0125, looks like it has only been fixed for “normal” mode.

While the fix above works, it makes it a bit cumbersome when using streaming mode.

Just checked with 0409 and it seems that JSON mode works correct again, even with German Umlauts.