Superapp Architecture: ChatGPT Should Be the Non-Blocking Master Process, Not a Peer Tab

Hi,

I’m in the Codex developer program. I have a specific architectural suggestion for the superapp that I think is worth 3 minutes of someone’s time.

Your current direction: merge ChatGPT + Codex + Atlas into one app with multiple modes.

What I think it should be: ChatGPT is the only interface. Codex and Atlas are non-blocking background subprocesses that ChatGPT can spawn.

The difference:

- “Merge” means three tools in one window. Users still switch between them.

- “Orchestrate” means one conversation that dispatches work in the background. Users never leave the chat.

Before I explain the architecture, let me show you why this is a competitive opportunity, not just a UX improvement.

-–

Anthropic’s Claude Desktop currently has three modes: Chat, Cowork, and Code. They sit as peer tabs in one app. This is exactly the “merge” approach — and it has two critical problems:

1. They are mutually exclusive. When a user runs a Cowork task, they cannot simultaneously have a Chat conversation. The modes block each other. A 30-minute file organization task means 30 minutes of zero conversation.

2. They impose tool selection on the user. A user has to decide: is this a Chat question, a Cowork task, or a Code job? That decision requires understanding what each mode does, which is a cognitive tax that shouldn’t exist.

Anthropic’s Dispatch feature (sending Cowork tasks from phone to desktop) gets close to the right idea — but it’s bolted on as a feature, not built in as the architecture. The phone is the afterthought, not the primary interface.

If you simply merge ChatGPT + Codex + Atlas into peer tabs the way Anthropic did with Chat + Cowork + Code, you will have replicated their product. And their product has a structural ceiling that users are already hitting.

The opportunity is to skip past that ceiling entirely.

-–

The critical word is non-blocking.

Right now, every execution interface — Codex, Atlas, any agent task, and Anthropic’s equivalents — is blocking. When a task runs, the user waits. The conversation stops. The user’s cognitive flow dies.

This isn’t an efficiency problem. It’s a cognition problem. Human working memory is fragile. Interrupt it for 15 minutes while Codex runs a task, and the user’s train of thought is gone. They have to rebuild context from scratch. Many threads of reasoning never recover — the associations and intuitions that were active in working memory are not reproducible.

The most valuable work happens when reasoning and execution are concurrent, not sequential. A user discusses architecture with ChatGPT, dispatches a scanning task to Codex mid-conversation, continues discussing strategy while Codex works in the background, and when results come back, the conversation context is still warm — they analyze the output immediately with full mental context intact.

No AI product offers this today. Not yours, not Anthropic’s, not Google’s.

-–

The architecture:

1. ChatGPT is the primary interface — cloud-based, accessible from any device, always available. This is the user’s persistent thinking space. It is never blocked by anything.

2. ChatGPT can spawn Codex or Atlas tasks as background subprocesses on the user’s local machine. These appear as task cards in a sidebar — real-time status, expandable into full Codex/Atlas interfaces where the user can directly interact with the executing agent if needed.

3. Every subprocess is non-blocking by default. The main conversation continues while tasks run. Results surface back into the chat stream when ready. The user can click into a subprocess to interact with it directly — that subprocess can have its own back-and-forth with its own agent — but the main ChatGPT thread is never waiting.

4. Each subprocess can run a different model tier. ChatGPT uses the strongest reasoning model for the thinking layer. Background Codex workers can use lighter, faster models for execution. The user can override per-task.

5. The UI is essentially a browser: one persistent main tab (ChatGPT) that can open child tabs (Codex/Atlas instances). The existing Codex and Atlas interfaces don’t need to change — they just become child panels instead of top-level modes.

This requires zero new technology. ChatGPT exists. Codex exists. Atlas exists. MCP exists. The model selector exists. You just change the hierarchy: from three peer tabs to one master process with non-blocking background workers.

-–

The competitive framing:

Anthropic built Chat + Cowork + Code as peer modes. That’s their ceiling. If you merge ChatGPT + Codex + Atlas as peer modes, you match them. If you make ChatGPT the master process with Codex and Atlas as non-blocking workers, you leapfrog them.

The result: one conversation. Everything else is background. Nothing blocks.

This eliminates the concept of tool selection entirely. Users never decide whether to open ChatGPT, Codex, or Atlas. They just talk. The system dispatches. This reduces the barrier from “learn which tool to use” to “know how to speak” — which everyone already does.

First company to ship this doesn’t just have a better product. They redefine the interaction paradigm, the same way iPhone redefined input from “learn to type” to “learn to touch.”

-–

I use ChatGPT and Codex daily across firmware security research, creative writing, and technical analysis. I also use Claude extensively. The mode-switching friction and the blocking execution model are the two biggest limitations across both platforms. Happy to elaborate on any of this if useful.

Interesting. I’m wondering why Anthropic did not consider that. Afterall, they have some very smart people. Maybe a non-blocking work flow :

  • Has potential unforseen issues that were beyond scope.
  • Has compute constraints.
  • OS multi-tasking contraints.
  • Has a complex cost and billing structure that was also beyond scope.
  • Does not take into account the fact that many people cannot multi-task.

Keep in mind that new complex architectures change and are enhanced over time

Good points. Let me address the last one first because it’s the most important: this architecture doesn’t require users to multi-task. The opposite — it eliminates multi-tasking. Right now users are forced to multi-task: switch to Codex, monitor execution, switch back to Chat, rebuild context. The proposal is that users stay in one single-threaded conversation. Background workers run silently. Results come back when ready. The user never juggles anything.

On the infrastructure points (compute, OS constraints, billing): these are real engineering constraints, but they’re implementation details, not architectural blockers. The hierarchy change (chat as master, tools as workers) is a product design decision. How to schedule, meter, and bill the background workers is a separate problem that gets solved after the architecture is right.

And on “why didn’t Anthropic do this” — I think it’s the organizational structure point. Chat team, Code team, Cowork team, each with their own roadmap. Making one team’s product subordinate to another’s is a political decision before it’s a technical one.

Interesting direction.

What especially resonates with me is the shift from “tool switching” toward persistent operational context.

I’ve been exploring something adjacent inside EP-OS, but from a slightly different angle:

once multiple long-running subprocesses share mutable semantic space, orchestration alone may not be enough. The system also needs contextual isolation and topology-aware coordination to prevent reasoning drift between agents over time.

In other words, the next challenge may not only be:

“how do we run agents concurrently?”

but also:

“how do we preserve semantic integrity while they operate concurrently inside persistent environments?”

That’s where I suspect multi-agent systems may eventually start borrowing more concepts from operating systems:

- process isolation,

- memory zoning,

- deterministic routing,

- and controlled inter-process semantic exchange.

You’re pointing at the right next layer. My post deliberately stays at the product architecture level — who is the master process, what’s blocking vs non-blocking — because that hierarchy decision has to come first. But you’re right that once you have multiple concurrent agents sharing a persistent environment, you immediately hit the OS-level problems: isolation, memory boundaries, routing, IPC.

The interesting thing is that these problems have well-understood solutions in traditional OS design. The question is whether AI product companies will recognize they’re building an operating system before they accidentally build a broken one.

What’s EP-OS?

Exactly. That’s what started pushing me toward the OS metaphor in the first place.

At small scale, multi-agent systems still look like orchestration problems. But once persistent memory, concurrent contexts and long-running interaction loops appear, the failure modes begin to resemble operating-system-level problems much more than prompt-engineering ones.

I think a lot of current AI architectures still underestimate how quickly semantic instability accumulates without isolation boundaries, routing discipline and controlled state interaction.

So now I’m trying to explore what “semantic process management” could look like before these systems become too large to reason about coherently.

I’m not so sure about this. Can you give an example? Or are well-understood solutions just assumptions without proof of concepts?

Fair challenge. Let me give concrete examples, then reframe the actual problem.

Process isolation → Agent sandboxing. Codex already runs each task in its own cloud sandbox. This is containerization — the same namespace primitives OS engineers have used for a decade.

Concurrent write protection → Git worktrees. Codex supports up to 6 subagents running in parallel on the same repo, each in an isolated worktree. No merge conflicts. This is copy-on-write with version control — a solved OS problem.

Process scheduler → Orchestrator pattern. Codex has explorer (read-only), worker (read-write), and default roles. A central coordinator dispatches to specialized agents. This is role-based preemptive scheduling.

Filesystem permissions → Tool access controls. Agent A can execute code, Agent B can only search. Capability-based security, straight from OS design.

So @evopyramidai’s concern about “mutable semantic space” is actually already addressed at the file system level — worktree isolation prevents agents from corrupting each other’s work.

But here’s what this reveals: the multi-agent problem is solved. The blocking problem is not.

The correct architecture is three layers:

  1. Non-blocking conversation layer (ChatGPT) — the user’s persistent thinking space, never interrupted

  2. Blocking task orchestrator (Codex) — manages multiple agents, reviews diffs. Blocking internally is fine — because it runs as a background subprocess, not as the user’s primary interface

  3. Multi-sub-agent execution layer (Codex subagents) — already exists, already works, worktree isolation, parallel execution, role separation

Layers 2 and 3 exist today. The missing piece is the connection between layer 1 and layer 2. Right now, to manage your Codex agents, you have to leave the conversation. That’s the blocking point my original post is about — not agent concurrency (which is solved), but cognitive continuity (which isn’t).

Right, this I understand. So, any kind of polling or notification solution while in the conversation is insufficient?

No — notification is necessary but not sufficient. Here’s the difference:

Polling/notification = “Your Codex task finished. Click here to see results.” You still have to leave the conversation, open Codex, review the output, then come back to Chat and manually re-describe what you saw. By that point your conversational context is stale.

Result reintegration = The results flow back into the conversation as first-class content. ChatGPT can see the output, analyze it, and continue the discussion — all without you leaving. You say “scan this repo for vulnerabilities,” keep discussing threat modeling for 10 minutes, then the scan results appear in the chat and ChatGPT says “three of these findings are in the SMM attack surface we were just discussing.”

The difference is: does the conversation know what the background task produced, or does the user have to be the messenger between two disconnected interfaces?

Notification solves “when is it done.” Reintegration solves “what do we do with it next.” The second one is where the cognitive value lives.

Scenario: A user has a background task running. A ChatGPT conversation then spawns one or more concurrent background tasks. The question is: How far should the architecture be allowed to go? How disciplined should a user be? Should there not be any controls?

The controls live in the master controller, not in the user’s hands.
The user doesn’t manually spawn background tasks. They talk. The master controller (GPT-5.5) decides what needs to be dispatched, when, and how many. If the user says “scan this repo, also refactor the auth module, and research the latest CVEs,” the controller evaluates: can I handle some of these myself? Which ones need Codex? Should they run in parallel or sequential? Is there a dependency?
So the answer to “how far should the architecture be allowed to go” is: as far as the master controller judges appropriate. It’s the same as a project manager — you don’t tell a PM “you may only have 3 people working at once.” You say “here’s the budget and the deadline, figure it out.”
The practical constraints are natural:
• Compute budget per tier. Plus users get X concurrent background slots, Pro users get more. This maps to subscription value without requiring user discipline.
• Master controller judgment. It won’t spawn 10 tasks if 2 will do. It actively conserves resources — “Pro credits are valuable, I won’t waste them” applies equally to Codex slots.
• User override. The user can always say “pause that task” or “cancel all background work.” Full control, but only when they choose to exercise it.
The key insight: users don’t need to be disciplined. The master controller is disciplined on their behalf. That’s the whole point of having an orchestration layer — it absorbs the complexity of resource management so the user can stay in the conversation.

Nothing I hate more than project managers (just kidding, I used to be one :roll_eyes:). Really appreciate you breaking it all down. Facinating. If and when a Superapp like this comes into fruition, I’ll be there!

Ha — fair enough. Thanks for pushing on every point, Vern. The thread is better for it.

Building on your master controller idea, I think there may need to be a separate resource/execution manager layer.

Worker contexts could think, plan, and write code in parallel, but access to shared local resources should be coordinated separately: terminal, filesystem, test runner, dev server, database, containers, etc.

This could work similarly to a semaphore model: worker contexts request execution slots from the resource manager, the manager grants access when the shared resource is available, tracks running executions, and releases the slot after completion.

That would keep the architect/master context focused on system design and integration, while a separate control layer handles resource contention and execution order.

Good refinement. You’re right that as the number of concurrent workers grows, you need to separate “what to do” (master controller) from “who gets the terminal right now” (resource manager). Mixing those two responsibilities in one layer would overload the master context with scheduling details it shouldn’t care about.

But I’d argue this is a phase 2 concern. The semaphore layer becomes necessary when you have 4+ workers contending for shared resources. The first step — and the one that delivers 90% of the user experience improvement — is simply making the conversation non-blocking. Even with a single background worker and zero resource contention, the cognitive continuity gain is massive.

Get the hierarchy right first. The resource management layer slots in naturally once the master-worker relationship exists.

Good point — I think the terminology matters here, so I want to separate the roles more clearly.

By “master” I don’t mean one context that both designs the system and manages every terminal/test/server slot. That would overload the main context with operational state.

I would split the system into three responsibilities:

  1. Master Architect / main conversation
    Owns user intent, architecture, decomposition, high-level decisions, and reintegration of results back into the conversation.

  2. Task-specific worker contexts
    Handle isolated subtasks: writing code, reviewing modules, researching issues, testing a hypothesis, etc.

  3. Master Controller / Resource Manager
    Handles the operational layer: terminal ownership, dev server access, test execution, database/container access, scheduling, cancellation, and contention between workers.

A cleaner hierarchy would be:

User
↔ Master Architect / main ChatGPT conversation
    ├─ task-specific worker contexts
    └─ Master Controller / Resource Manager
         └─ terminal, filesystem, test runner, dev server, database, containers

So the Master Architect decides what work should exist and creates the necessary worker contexts. But it should not have to track low-level execution state such as “which worker currently owns the terminal” or “which test runner is busy.”

Workers would request execution/resource slots from the Master Controller. The Controller grants access, tracks running operations, handles conflicts, and releases resources when done. The final results then flow back to the Master Architect for integration into the user-facing conversation.

So I agree that the first and most important UX win is the non-blocking conversation layer. My refinement is that, once background work becomes persistent and concurrent, separating architectural cognition from operational execution becomes important for preserving the same cognitive continuity this proposal is trying to protect.

Clean decomposition. Your three-way split is exactly right — and it maps naturally to model tiers: Master Architect runs a capable standard model (strong enough for conversation, decomposition, and judgment), Resource Manager can run something lighter (scheduling decisions, not creative ones), and workers run task-optimized models.

The key design principle you’ve articulated well: the Master Architect should never have to think about “which worker owns the terminal.” The moment operational state leaks into the conversation context, you’ve re-introduced the cognitive pollution this whole architecture is trying to eliminate.

Good framework. Thanks for formalizing it.