I have been speedrunning for 10 days with a bunch of small ChatGPT Apps to experiment + see whats possible. So freaking fun and exciting
Didn’t really fit the picture of a full on “app” in ChatGPT (more like POCs) but just wanted to share what I’ve done so you can see what you can do + get my lessons learnt from it all and what potentially OpenAI can do in the Apps SDK.
Lessons Learnt
Since things are moving so quickly, maybe some of the following has been fixed already, feel free to lmk!
CSP does not support anything other than https, for example, if I wanted to permit wss://example.com/ws, the Apps SDK would set this to be https://wss://example.com/ws
Write tool calls are not supported on mobile apps
Mobile PIP potentially makes the PIP window disappear on a subsequent tool call that does not render a widget
Since all widgets + resources and loaded at connection time, you have to make sure you wait for the render data to finish loading up. It would be great in the future for the Apps SDK to support rendering widgets from the response of the tool call itself, saving us the round trip from the render data (more aligned with MCP-UI)
Apps
Day 9 - Javascript Code Running Sandbox in ChatGPT
React Server Components for widget resources using RedwoodSDK - this was a huge win for me for DX, didn’t need a separate build step, can render from tool response directly versus waiting for render data (MCP-UI approach) when it is supported
Optional - MCP-UI - Used MCP-UI to support other clients as well, but not required for ChatGPT Apps
Overall, don’t need to overthink, you can do any typical frontend setup + tools + frameworks here for Apps SDK
I haven’t felt so excited in a while. Apps in ChatGPT is really the future tbh. And Apps SDK was really well thought out.
Thank you for your sharing, it’s cool that you’ve built so many apps so fast!
interesting, do you mind elaborating more on what you meant by “at connection time” and “wait[ing] for the render data to finish loading up”? More specifically, if I’m understanding it right, at which point in time during the component render phase after an MCP tool call do these correspond to? Why is saving the round trip from render data going to be a significant performance improvement?
On another unrelated note, how did you record those crisp demo videos on those Twitter X posts? I like the part where you zoomed into specific areas of the screen. Those showcased your ChatGPT apps really clearly and I thought to learn from that.
So currently, how it works in the Apps SDK is the following:
When a user connects to an App, before any tool is called, ChatGPT Apps SDK loads up all the static resources / widgets and caches them
When a user makes a tool call that corresponds to a specific resource / widget, it will load up the cached resource
When the resource is loaded, the resource needs to wait for the render data to get the tool output or any other metadata that was sent from the tool output if needed
From my testing, this render data thats providing the tool output takes a while to be propagated from the Apps SDK to the resource. But disregarding the performance, which isn’t huge, is just being able to generate the resource at tool output time would be a huge DX improvement
Very cool. I’ll look into Redwood myself. interesting.
I’m curious, does Redwood output the html + inline all the needed JS + css, so we don’t have to host different versions of those resources?
Since the document is heavily cached, you need to version any resources (which is good), but then you also need to keep a backlog of them for “older” versions to continue working (especially during a rollout).
Definitely some things I need to fix to get a better DX for versioning + cache-busting, but right now, HTML + JS + CSS all works on my end with Redwood
Didn’t think too deep with backloging older versions, which totally makes sense though, hmm need to see how it could work
Yes, once we roll out our app, one thing we need to solve is how back behind we support the “old” UIs.
Since widgets are small, isolated entry points, one posible solution is having everything inline in the original html that is cached by ChatGPT. that way, this becomes a no problem.