Turkish character encoding with API

Hi everyone, I am trying to use the API with GPT-3.5 Turbo to evaluate quality translation using the quality metrics GEMBA. The metrics takes two inputs files with the same number of lines representing the source text and target translation.

When using UTF-8 for my input .txt files, the outputs signalize that there are problems with the character encoding not being processed correctly. I did some research here in the forum and saw a post about having to switch to Latin-1 for the model to process the input correctly.

This has worked for languages like French and German, since they are supported by Latin-1; other languages like Turkish don’t seem to work with that type of encoding and don’t recognize all Turkish characters.

Turkish uses the following special characters that are not part of Latin-1:

Ç (U+00C7 in Unicode) and ç (U+00E7) — although these are covered by Latin-1.
Ğ (U+011E) and ğ (U+011F) — not in Latin-1.
İ (U+0130) — not in Latin-1.
ı (U+0131) — not in Latin-1.
Ş (U+015E) and ş (U+015F) — not in Latin-1.

Does anybody know a workaround for this issue?

I am happy to hear from you!