I would love to share in a public way all the details of the project and the current scenario but not right now, maybe in late December.
In that sense, I will send the details and my intention to contact you in your private, right?
Thank you very much and all help will always be very welcome, especially for the hundreds of other indigenous languages of the Latin American continent. Three months ago I had the idea of bringing something to the scenario in my country, Brazil, in parallel to what we are developing to be commercialized, which is content created with GPT-3 and I came up with the idea of trying to connect Indians with AI, it was then that I discovered that GPT-3 didn’t understand Tupi-guarani, one of the main indigenous languages spoken in Brazil and in several other countries of the continent.
We are using this dictionary model. This reply is to say that this is not a top secret Martian project and that I was very careful to start this part preserving as many records as I could.
My difficulty now is the readjustment of the strokes and signs that exist in the typography. Although the book has been scanned with OCR (the administrator responsible for maintaining the repository and library has scanned dozens of books and used OCR on other books as well).
I was able to use nltk, pandas and numpy in some tests I did but I have not yet quantified how many terms I will lose if I choose a template without the words or letters without apparent encoding.