I’ve been prompting the API with Chinese language (5 seconds on Avg with GPT-3.5) for a while and I’ve realised the latency is really high compared to English (2-3 seconds on average).
My issue is that I want the response to be in Chinese. How do you guys deal with this? I saw there is OpenAI on Azure but it is only for a select number of Entreprise clients.
“Latency” is the wrong phrase. OpenAI also misuses it when they want to say “token generation rate” to us. Latency could be how long it takes for you to get the first token of a stream, though, including the time of network and loading of context.
What you are really talking about here is the perceived character and language production rate, how many lines of text are coming out of the AI per minute.
Chinese has a very high token consumption per character. Unlike English where a word can be a single token and be 10 characters, a Chinese glyph can require two tokens.
Advantage: Chinese has much more meaning per character though (turn 4000 characters of English into 1500 characters of Chinese to fit into the “custom instruction” box).
This means that even though the AI has the same token production rate, the appearance of streaming text seems slower for Chinese-based languages.