Is it valid in a ChatGPT App + MCP setup to build my own streaming React frontend with a Python backend? Looking for architectural guidance

Hi everyone — I’m working on a ChatGPT App for real-estate consulting and would love some feedback from the community on whether our architecture approach makes sense.


Background

We’re building a real-estate advisory ChatGPT App that provides users with investment insights, purchase guidance, and other property-related recommendations.

Our workflow looks roughly like this:

  • A user starts a conversational session inside the ChatGPT App.

  • The model invokes our MCP tool to call backend services when needed.

  • During the dialogue, the App may ask clarifying questions to understand the user’s goals, constraints, and preferences.

  • Once enough context is collected, we want to present richer, more interactive results—such as structured summaries, visualizations, comparisons, etc.

Some of this output takes time to generate, and we would prefer users not stare at a static loading indicator. Instead, we’d like to create a smoother, “ChatGPT-like” experience where text flows in progressively.

What we want to achieve technically

Within the ChatGPT App’s Widget (React), we want:

  • a custom React component that renders text as a streaming, typing-style animation

  • our Python backend to send incremental chunks of data (SSE or a streamed HTTP response)

  • the frontend to display these chunks as soon as they arrive

Conceptually, we’re thinking of two independent channels:

ChatGPT ↔ MCP Tool     → (standard one-shot tool call)
React UI ↔ Our Backend → (actual streaming channel)

The MCP Tool continues to follow the required one-shot JSON protocol.
Separately, our widget establishes a streaming connection (SSE/WebSocket/Fetch Stream) to our backend, which delivers content progressively.

This would give us:

  • lower perceived latency

  • smoother user experience

  • the ability to reveal complex or graphical content gradually

The key questions for the community

1. From a protocol/architecture standpoint:

Is it acceptable in the ChatGPT App + MCP ecosystem to split responsibilities this way?

  • ChatGPT ↔ MCP Tool for conversation + reasoning

  • React Widget ↔ custom backend for real-time streaming visualization

Is the ChatGPT App system intended to support this pattern?

2. Has anyone in production done something similar?

We’re curious whether other teams have adopted a parallel streaming channel for richer UI behaviors.

3. Potential pitfalls or security concerns?

For example:

  • Running a custom SSE/WebSocket client inside the ChatGPT App Widget

  • Managing session state between ChatGPT’s tool calls and our backend

  • Ensuring isolation and secure data flow

4. Is this aligned with the intended ChatGPT App developer model?

Or is there a more canonical recommended way to implement progressive rendering or real-time UI updates in a ChatGPT App Widget?

:folded_hands: Looking for guidance & confirmation

Our end goal is:

  • ChatGPT handles intent understanding + tool orchestration

  • Our widget handles richer, interactive, progressive visualizations

  • Users get immediate feedback rather than waiting for full processing to complete

If anyone from the community (or from OpenAI) has best practices, caveats, or recommended architectural patterns, we’d greatly appreciate your insights.

Thanks in advance! :folded_hands:

Sounds like it would mostly work with a few caveats:

  • Widgets don’t support Websockets but they I believe support SSE. I believe this is how the Cloudflare agent demo of the poker game works to share state between players in “real time”.
  • Yes, you can have your widget communicate with your backend directly as long as you setup the CSP stuff as stated in the docs - the endpoint needs to be explicitly mentioned or it won’t work.
    • It may be obvious but the data you share with the widget itself that way won’t be visible to the model. The model only knows about content and structuredContent from the MCP response.
  • Some of the partner apps that have already launched seem to effectively have a setup like this - i.e. Zillow has the map (MapBox I think) which while it’s a third party thing and not Zillow proper is conceptually what you’re asking about - something rendered in the widget that communicates with a backend service independent of the MCP channel.

Thanks a lot for the detailed response — that really helps clarify things. It sounds like this approach is technically feasible, especially with SSE and the appropriate CSP configuration, so I’ll continue experimenting with it.

One thing I’d like to ask: do you expect this pattern to remain supported going forward? Since the widget-side data flow isn’t visible to the model itself, I’m wondering whether OpenAI might eventually restrict or formalize this kind of interaction.

Thanks again for your insights!

I certainly can’t speak for OAI but given all of the hooks that they’ve provided (i.e. the CSP hooks) and that their featured partners are doing similar, it seems about as approved and supported as you’re gonna get. :slight_smile:

From the design of the SDK I think they understand that not everything needs to be a model/MCP interaction… or at least that there needs to be flexibility to support other communications channels with your backend tec.

I think you are correct in woring about support forward. As you said, the model does not see the extra data.

We have not tested yet, but in the MCP protocol there is a concept of notification/messages

This is what we plan to check out when we get to the task of sending back progress updates.

If you give it a try first, do let us know how it went, if it’s supported at all in ChatGPT

It’s recommended to use window.openai.callTool() method instead of direct API calls to interact with your backend. In this case additional CSP settings are NOT required.

Just declare new MCP tools, and use them on demand within your widget via window.openai.callTool()

Thanks so much for the clear explanation and the concrete solution!
Using window.openai.callTool() to interact with the backend definitely helps avoid extra CSP configuration, and—more importantly—keeps the communication flow fully within the officially supported path for ChatGPT Apps.

We were indeed worried about future compatibility and whether the model would have visibility into any additional data. Your suggestion essentially gives us a safe, scalable way forward.

This approach looks very workable for our setup. We’ll follow your recommendation by wrapping our backend logic as MCP tools and invoking them from the frontend through callTool. Really appreciate your patience and for sharing your experience! Thank you and achieving100ms

1 Like

I encountered similar situation with @Zhou_Hui . The types.CallToolResult and structuredContent do not look like allowing chunk data. Also, the window.openai.callTool() expects traditional request-response model instead of SSE.

It would be great if you can provide simple code snippets to explain. Thank you.

  1. You could definitely structure it this way, but from their guidelines, you would need to figure out auth if needed for React Widget < - > custom backend since the OAuth token is stored on ChatGPT’s backend

  2. Websockets do work in the widget, i’ve made multiple examples with it, ie: https://x.com/gching/status/1988250074850488715?s=20 - you aren’t limited

  3. From Widget to your own backend, definitely, you need to figure out a way to authenticate the user, might not be doable given the current spec at the moment. Managing session state is technically doable if the user is authenticated, you can have some store specific to a user

  4. No canonical recommended way, but I’ve played around tons. I have lots of examples in Lessons learnt from speedrunning ChatGPT Apps, which examples using Websockets and tons of things.

Why is it not recommended to call backend api directly from widget? We can have the token from the tool in _meta right? Whats the issue in calling the backend api directly from widget? If anyone can explain this?

1 Like