How can Armenian language outputs be improved?

I understand that the amount of data in Armenian is much less than is needed to train a “fluent AI bot”. In that case, what measures should be taken to train Chat GPT 3 or 4 to work properly in Armenian?

Perhaps you can look into setting up a test group of individuals willing to actually focus on training the model or provide a separate trained model that OpenAI can integrate into GPT-4 or the next version?

Hi @lus.alex.28

Welcome to the community.

The tokenizers for GPT-3,3.5 and 4 are all based in English. But the models were trained on a large dataset, which is why the models may demonstrate some “understanding” of non-English languages.

Having said that, it is possible to fine-tune (not train) base GPT-3 models to generate better Armenian. You can do this by compiling a dataset of Armenian literature, and fine-tuning any of the base models ada, babbage, curie, davinci with that data.

This sounds simple but involves a good number of steps. Also, since the tokenizer is based on English, the costs might be higher.

Feel free to ask questions about any issues you run into.