Hello, I found that for Russian text each letter is a token. At least I ingest this from my pricing bill. Is it correct? Not 0.75 word == token, like in English, but 1 russian letter == token?
Here’s a tokenizer where you can test…
Welcome to the community!
Yes, thank you! Tested, 1 russian letter is 1 token indeed. What a discrimination