Ilya said in a recent podcast on “No Priors” (can’t post the link) that the “data limit can be overcome” (skip to 26:38). Does this indicate they are using “self-play”, similar to AlphaZero?
Also if I had to guess, the “breakthrough” likely boils down to the ability to do long term planning, and addresses the limitations due to auto-regressive generation, which would also make sense in the context of solving math problems.