Hi everyone — I’m working on a ChatGPT App for real-estate consulting and would love some feedback from the community on whether our architecture approach makes sense.
Background
We’re building a real-estate advisory ChatGPT App that provides users with investment insights, purchase guidance, and other property-related recommendations.
Our workflow looks roughly like this:
-
A user starts a conversational session inside the ChatGPT App.
-
The model invokes our MCP tool to call backend services when needed.
-
During the dialogue, the App may ask clarifying questions to understand the user’s goals, constraints, and preferences.
-
Once enough context is collected, we want to present richer, more interactive results—such as structured summaries, visualizations, comparisons, etc.
Some of this output takes time to generate, and we would prefer users not stare at a static loading indicator. Instead, we’d like to create a smoother, “ChatGPT-like” experience where text flows in progressively.
What we want to achieve technically
Within the ChatGPT App’s Widget (React), we want:
-
a custom React component that renders text as a streaming, typing-style animation
-
our Python backend to send incremental chunks of data (SSE or a streamed HTTP response)
-
the frontend to display these chunks as soon as they arrive
Conceptually, we’re thinking of two independent channels:
ChatGPT ↔ MCP Tool → (standard one-shot tool call)
React UI ↔ Our Backend → (actual streaming channel)
The MCP Tool continues to follow the required one-shot JSON protocol.
Separately, our widget establishes a streaming connection (SSE/WebSocket/Fetch Stream) to our backend, which delivers content progressively.
This would give us:
-
lower perceived latency
-
smoother user experience
-
the ability to reveal complex or graphical content gradually
The key questions for the community
1. From a protocol/architecture standpoint:
Is it acceptable in the ChatGPT App + MCP ecosystem to split responsibilities this way?
-
ChatGPT ↔ MCP Tool for conversation + reasoning
-
React Widget ↔ custom backend for real-time streaming visualization
Is the ChatGPT App system intended to support this pattern?
2. Has anyone in production done something similar?
We’re curious whether other teams have adopted a parallel streaming channel for richer UI behaviors.
3. Potential pitfalls or security concerns?
For example:
-
Running a custom SSE/WebSocket client inside the ChatGPT App Widget
-
Managing session state between ChatGPT’s tool calls and our backend
-
Ensuring isolation and secure data flow
4. Is this aligned with the intended ChatGPT App developer model?
Or is there a more canonical recommended way to implement progressive rendering or real-time UI updates in a ChatGPT App Widget?
Looking for guidance & confirmation
Our end goal is:
-
ChatGPT handles intent understanding + tool orchestration
-
Our widget handles richer, interactive, progressive visualizations
-
Users get immediate feedback rather than waiting for full processing to complete
If anyone from the community (or from OpenAI) has best practices, caveats, or recommended architectural patterns, we’d greatly appreciate your insights.
Thanks in advance! ![]()