I wanted to share an observation and see if others in the Apps SDK / MCP space are experiencing something similar:
To me, GPT 5.3 currently feels noticeably less reliable than GPT 5.2, especially when it comes to handling MCP tools and their descriptions.
A few concrete things I’ve been seeing:
-
The output feels less user-friendly. GPT 5.3 tends to include more “thinking out loud” about what it’s doing, and responses are often more technical than necessary instead of being clearly and “interesting” structured for users.
-
Tool calls are more frequently made with incorrect parameters, even when the schema is explicitly and clearly defined.
-
Widgets are sometimes not triggered properly. Instead of using the intended widget (e.g. via a button-triggered input), the model falls back to plain text responses.
-
Tool calls occasionally happen too late, with the model first writing internal-style messages like “I’m going to call tool X now” before actually executing the call.
This is obviously not a formal evaluation, just based on hands-on usage and observation.
Still, I’d be really interested to hear if others seeing similar behaviors.