From everything I’ve seen online so far WordNet is only for English, and nltk provides a good lemmatization method but I don’t think it’s multi-lingual. Did you have a multi-lingual use of NLTK in mind?
(A lot of hits have come up in Google Scholar and Google for “GPT-3 lemmatization” and “multi-lingual lemmatization” so I think I’ll find some good reading material on this.)
But maybe if there’s a website where good quality datasets can be found for a wide variety of languages - or if there’s a single ubiquitously usable web crawler to generate a language-specific dataset for any language - then a system like GPT-3, maybe BERT, could be trained on each language’s dataset, and you could have a multi-lingual lemmatizer pretty easily. What do you think?
You could try a fine-tuning set for multiple languages. Certainly you can probably hit the top 10 or 50 languages in the world but again, if the model hasn’t seen most of the vocabulary in a language then it will just be guessing.
What I’m remembering was that >90% of the total volume of training data was English. So yes, it may have seen other languages but I would not count on it having any solid grasp of the others. However, if that number is wrong then yeah, maybe you can train a lemmatizer for the languages it has seen enough of. However, if it hasn’t seen a language at all, then I suspect it won’t work.
Oh but also there’s no reason not to try it. I would say use DAVINCI INSTRUCT and just tell it to lemmatize your words.
Using a Vision API to extract the textual data’s (i.e., pdfs) algorithms. I used it with Oleo (Hawai’ian). The Vision API may be used to train a language model (Luca Pacioli, Divina Proportione). I’m not sure how Da Vinci got most of the credit here. See Euclid too. I like using Φ to validate my models for its symmetry. I suppose π would work too but this may add complexity. I think this is equal to Φ^2.
Just based on trying out of the Chat GPT; I could suspect that lemmatization could work even ten times better. Not that I’m any expert on this field but I’m more than surprised how well rather small finnish language works already with Chat GPT demo.