Anyone doing successful translations with gpt 3.5?

First, please know this is venturing well-outside my expertise and personal experience.

But, this is how I expect it would work,

You would create a large set of these lazily-typed minority language examples and their corresponding target translations.

Fine-tuning on this dataset should give the model enough understanding to be able to infer when these tone phonetics are missing and respond appropriately.

You could do this either as a straight translation task, e.g.

\begin{align} &\text{Missing tone phonetics}\\ \rightarrow &\text{Translated text} \end{align}

Or as a “chain-of-thought” process, e g.

\begin{align} &\text{Text missing tone phonetics}\\ \rightarrow &\text{Chain of Thought:}\\ &\quad\text{It looks like you've entered some text in}\\ &\quad\text{[Minority Language] but didn't include}\\ &\quad\text{tone phonetics. }\\ &\quad\text{I think you meant,}\\ \rightarrow &\text{Text with added tone phonetics}\\ \rightarrow &\text{Translated text}\\ \end{align}

If the translation works well (in general) for your minority language of interest then the issue you really need to tackle is the intermediary “translation” from “lazy” to “proper.”

Fine-tuning the direct translation might help the AI “figure out” it needs to do that implicitly.

But, I think fine-tuning with a chain-of-thought process will ultimately yield stronger results.

One thing you should try first though is seeing if the model can rectify the missing tone phonetics without fine-tuning.

Basically, give the model an example input, tell it the language and that the tone phonetics are missing and to correct the input to include the tone phonetics.

If that works, you may be able to just use the models as they are (possibly with some extra steps).

Or, you might be able to get away with fine-tuning a “rectifier” model.

To do that, I’d try first with gpt-3.5-turbo, get as much proper text as you can in your minority language, then remove the tone phonetics. Then, use these “correct” and “incorrect” pairs to train a model to correct incorrect input text.

To have a robust model you would probably want to have examples with all, some, and none of the tone phonetics removed, that way you can send all of your text into the model and it should only make the corrections it needs to and leave well written text alone.

If you have a couple of example pairs of your language with and without tone phonetics, I would be happy and interested in playing with it a bit and share any insights I gain in the process.

1 Like