about this example taken from the gpt3 research paper
how does the model predicts the french word ?
the examples are instructions which are not particularly helpful for finding the result
so my assumption is → model is using “knowledge” from the training data ?
Deep networks such as GPT-3 tend to have locus points where certain concepts are embedded. I can’t find the article I saw it in but basically there are higher order abstractions embedded throughout the network. This allows it to generalize between languages and other tasks. In point of fact, models trained on multiple languages tend to do better. Having read some Stephen Pinker, I suspect it has to do with the way that different languages handle concepts and how those concepts are communicated. Some languages are more analytic, such as english, where we add more words to convey sense. Other languages are more synthetic, using conjugations to be precise. These different approaches force neural networks to learn different ways of abstracting the same concepts.