What is Q*? And when we will hear more?

Ilya said in a recent podcast on “No Priors” (can’t post the link) that the “data limit can be overcome” (skip to 26:38). Does this indicate they are using “self-play”, similar to AlphaZero?

Also if I had to guess, the “breakthrough” likely boils down to the ability to do long term planning, and addresses the limitations due to auto-regressive generation, which would also make sense in the context of solving math problems.

4 Likes