The SIMA paper mentioned that training on multiple games yielded a better result than specializing on JUST the evaluated game. Even if the game wasn’t in the models training set, and was approached in a “zero shot” manner, sometimes the “renaissance model” would even outperform the narrowly trained model (Goat Simulator, Satisfactory).
To what extent is it just, because it got better at vision, versus something higher level? Surely a tree in Hydroneer, Valheim, and Satisfactory all look different, so SIMA would be much better at “go to that tree” because it has a robust idea of what a tree can look like.
So is the benefit specifically limited to vision? Or can the benefits exist at the conceptual level, like would a GUI-based puzzle game about alchemy benefit a model that plays a text adventure about baking? Is it possible to work backwards, and start from a target domain, and let an LLM design a vareity of games for the model to train on? Game Developer Conference 2024 starts tomorrow, it’ll be interesting to see how many develops have pivoted into providing bespoke simulation environments for humanoid robots.