GPT 5.3 feels less reliable than 5.2 for MCP tools

I wanted to share an observation and see if others in the Apps SDK / MCP space are experiencing something similar:

To me, GPT 5.3 currently feels noticeably less reliable than GPT 5.2, especially when it comes to handling MCP tools and their descriptions.

A few concrete things I’ve been seeing:

  • The output feels less user-friendly. GPT 5.3 tends to include more “thinking out loud” about what it’s doing, and responses are often more technical than necessary instead of being clearly and “interesting” structured for users.

  • Tool calls are more frequently made with incorrect parameters, even when the schema is explicitly and clearly defined.

  • Widgets are sometimes not triggered properly. Instead of using the intended widget (e.g. via a button-triggered input), the model falls back to plain text responses.

  • Tool calls occasionally happen too late, with the model first writing internal-style messages like “I’m going to call tool X now” before actually executing the call.

This is obviously not a formal evaluation, just based on hands-on usage and observation.

Still, I’d be really interested to hear if others seeing similar behaviors.