I assume this is a polite way of saying “comparatively bad”, but I’d like to hear more about your specific experiences - where it falls short, and how you believe it is best used. I’ve spent extensive time with it as well, and I’ve been wondering recently, if those shortcomings in comprehension are useful as guidelines to better design our prompts, tools, etc.
For example, I recently had a problem which was easily solved by using 4o rather than 4o-mini, but after some experimentation, I found a few things which were confusing 4o-mini, and got it working with the smaller model. The question is, and as far as I know there’s no way to test this, whether doing so also improves comprehension in the larger models.