It was good in the “AI API model - just for developers” form, at release.
It definitely changed in quality. GPT 4.1 Degradation over the past 30 days - #21 by TpTheGreat
Someone should again run every benchmark that OpenAI has published at release against their models (and obviously not the ones that cost $1000 per call to say “look at us”). Maybe the next time I get close to credits expiring. Unfortunately, that doesn’t even tell the story about real world apps and refusals, and possibility of held-out private sets of benchmarks incorporated into new post-training to make dumber models appear to maintain performance).