0613 is ranked 25th on the Lymsys leaderboard, whereas 1106 is ranked 46th. Even 0125 and 0314 are ranked higher.
There’s a sample size of about 50,000+ votes. At that size, it’s hardly subjective. Lymsys works on an ELO system, where users pick the better LLM response out of two LLM responses to their queries. The results can be interpreted as 0163 winning more times over other models than 1106 has, implying that 0163 is superior.
Also, 1106 ranks lower than models that I can literally run on my laptop, whereas this is not the case for 0613.
