I am looking for people to contribute to our non-profit project that will initially serve as a reference for future translations of indigenous languages in Latin America.
The project is called Yanomai, I am on my own and am in the phase of processing the dataset, filtering the collected data and preparing it for use.
I would love to share in a public way all the details of the project and the current scenario but not right now, maybe in late December.
In that sense, I will send the details and my intention to contact you in your private, right?
Thank you very much and all help will always be very welcome, especially for the hundreds of other indigenous languages of the Latin American continent. Three months ago I had the idea of bringing something to the scenario in my country, Brazil, in parallel to what we are developing to be commercialized, which is content created with GPT-3 and I came up with the idea of trying to connect Indians with AI, it was then that I discovered that GPT-3 didn’t understand Tupi-guarani, one of the main indigenous languages spoken in Brazil and in several other countries of the continent.
We are using this dictionary model. This reply is to say that this is not a top secret Martian project and that I was very careful to start this part preserving as many records as I could.
My difficulty now is the readjustment of the strokes and signs that exist in the typography. Although the book has been scanned with OCR (the administrator responsible for maintaining the repository and library has scanned dozens of books and used OCR on other books as well).
I was able to use nltk, pandas and numpy in some tests I did but I have not yet quantified how many terms I will lose if I choose a template without the words or letters without apparent encoding.
Hello! We are an overseas practice team from Tsinghua University, China. We will be conducting fieldwork on the digital preservation of Indigenous languages in Brazil during early August 2025, focusing on digital documentation of Indigenous languages, community-driven language data collection, or AI-assisted speech recognition, which are highly relevant to your work. Could you kindly let me know the current status of this project, and whether you would be willing to engage in further communication with our team?
Dear Jose
We were delighted when we came across your drive to create a dataset for Indigenous languages in Latin America.
I represent Hecho Por Nosotros (HXN), a NGO based in Buenos Aires which is exploring a similar project.
I also take this opportunity to invite you to join us for the online side event hosted by HXN - the UN High-Level Political Forum 2025 session on Jul 14, 2025
The event is focused on various topics around How AI can help artisanal communities. It would be great if you could join Nelly P Garcia-Lopez., Assistant Professor, Universidad de los Andes, and a few more experts in an interactive breakaway session where we are addressing AI solutions to preserve indigenous wisdom , including traditional creative art forms
Apologies for the short notice, but I just got your links in a google search. ( this is kinda a shout into the void!) Please do share an email address where I can share more details