There’s a large dataset of human annotated questions and answers for a database query language - it’s the only one that exists. But, all of it is in Chinese. The actual query language is universal, but the questions and entities are Chinese.
If I fine-tuned on this, and asked questions in English, would it work out? I remember hearing some research about it’s transferability but I’m not certain how this has been seen to work
I’m not sure if translation is reliable enough - all of the semantic relationships between words would need to transfer perfectly for it to still be reliable.
We are barely even shown examples of how to make a sarcastic bot, so deeper levels of fine-tune, one really needs to think logically about how language model AI acts.
Hypothetical: What if, in my examples, I tuned an AI on only responding in Chinese to my English questions. Or only in English to my Chinese questions? If I said “no Chinese”, could it still answer the questions trained in Chinese?
I have a feeling that Chinese → Chinese knowledge examples in fine-tune may be much harder to activate with English inputs.