Transferability of multiple languages

There’s a large dataset of human annotated questions and answers for a database query language - it’s the only one that exists. But, all of it is in Chinese. The actual query language is universal, but the questions and entities are Chinese.

If I fine-tuned on this, and asked questions in English, would it work out? I remember hearing some research about it’s transferability but I’m not certain how this has been seen to work

Hi @leaf

You can try translating the dataset to English, while retaining the entities in Chinese.

Can you share a prompt completion pair?

		"query": "云艺文华的全称你知道是什么?", 
		"cypher": "match (:ENTITY{name:'云艺文华'})<-[:Relationship{name:'简称'}]-(h) return", 
		"answer": [{"": "云南艺术学院文华学院"}]

I’m not sure if translation is reliable enough - all of the semantic relationships between words would need to transfer perfectly for it to still be reliable.

I always like a ponderous question.

We are barely even shown examples of how to make a sarcastic bot, so deeper levels of fine-tune, one really needs to think logically about how language model AI acts.

Hypothetical: What if, in my examples, I tuned an AI on only responding in Chinese to my English questions. Or only in English to my Chinese questions? If I said “no Chinese”, could it still answer the questions trained in Chinese?

I have a feeling that Chinese → Chinese knowledge examples in fine-tune may be much harder to activate with English inputs.

