I’ve been working on a side project for awhile to create a model that is able to take in text written in the 1900s in Hawaiian language and translate it into a format modern readers can read it in (with all of the appropriate punctuation). It’s not trivial because adding the punctuation can change the meaning of the words and many words have multiple ways to be represented. It’s kind of a machine translation task, but to two very similar languages. I’ve had some success training my own model but have hit a wall and I was thinking that I could use ChatGPT to perform the task for the users.
I’ve tried fine-tuning with a largish data set (10+k sentence examples) but it wasn’t great, and I’ve tried prompting ChatGPT 4 and it’s closeish but still makes quite a few mistakes. So my question is, should I continue to try fine-tuning, try more/different prompts, or explore other tactics like embeddings?
Would love any advice, thanks.