Suggestion: Limited Token Coverage and Pre-Send Multimodal Inputs for App SDK

I’ve been developing a built-in app inside ChatGPT using the App SDK and MCP, and after building a real feature, I’ve noticed some structural limitations that significantly affect developer adoption.

In its current form, the App SDK is powerful as a tool framework, but it stops short at the point where the model itself becomes meaningfully usable—especially for multimodal scenarios. If ChatGPT App could partially “cover” model token usage under controlled limits, the overall developer value proposition would be much stronger.

More importantly, an extension point for pre-send multimodal inputs would be highly impactful. Concretely, before the user clicks “Send,” tools could be allowed—under explicit user confirmation—to attach images, audio, or files into the outgoing user message, rather than only returning post-generation tool outputs.

With clear user consent, strict limits, and platform-level safeguards, this could unlock significantly more compelling in-app experiences without undermining cost control or trust boundaries.

I believe this kind of constrained, opt-in pre-send capability would make the App SDK far more attractive to serious developers building inside ChatGPT.