Fine-tuning GPT for Direct Translations of an Old Indigenous Language - Seeking Advice

Hello fellow language enthusiasts and AI experts,

I am currently working on a project to fine-tune a GPT model for direct translations of an old indigenous language, specifically Mapudungun, which is spoken by the Mapuche people in Chile and Argentina. I am seeking advice on the best ways to create a dataset for fine-tuning and whether the available resources are sufficient for training the model.

I came across a dictionary available at the following URL, which provides translations between Mapudungun and Spanish:

My questions are:

  1. Is this dictionary sufficient to create a dataset for fine-tuning the GPT model, or do I need additional resources?
  2. What is the best approach to creating a high-quality dataset for training the model, considering the limited resources available for this language?
  3. Has anyone here worked on a similar project, or seen examples of fine-tuning GPT for direct translations of lesser-known or old indigenous languages? If so, please share your experiences and insights.

I appreciate any suggestions or guidance on this topic, as I believe that preserving and promoting the use of indigenous languages is of great cultural importance. I look forward to hearing from you and learning from your expertise.

Thank you in advance!


Did you get anywhere with this? Iā€™d like to do similar with some languages that have several grammars and dictionaries available.