Feature Proposal: Human-Approved Work Relay between ChatGPT and Codex

Feature Proposal: Human-Approved Work Relay between ChatGPT and Codex

Hi everyone,

I would like to propose a native, human-approved relay layer between ChatGPT and Codex.

The problem I am trying to solve is not “more autonomy”. It is almost the opposite: safer, clearer, and more auditable handoff between ChatGPT-side planning/review and Codex-side repo work.

Problem

When working on longer projects, I often use ChatGPT for planning, reasoning, review, and continuity, while Codex works inside the repository and produces files, reports, tests, or implementation changes.

Right now, the human becomes the copy/paste bridge between both sides.

That creates several problems:

  • context loss between ChatGPT and Codex

  • wrong-chat / wrong-paste risk

  • repeated summaries

  • unclear next-step authority

  • difficulty tracking what was reviewed versus what was approved

  • more friction for local/manual Codex workflows

  • fragile workarounds if someone tries to use browser automation

A browser or Chrome-based workaround is not the ideal product shape here. It introduces UI-state risk, wrong-tab risk, permission ambiguity, and audit difficulty.

What seems missing is a native, structured, human-approved relay surface.

Proposed feature

A “Human-Approved Work Relay” between ChatGPT and Codex.

The core rule would be:

Information may flow.
Authority must not flow.

A possible workflow:

  1. The human authorizes a bounded Codex workblock.

  2. Codex works only inside the allowed scope.

  3. Codex produces a structured final report.

  4. A relay layer validates the report for unsafe or ambiguous wording.

  5. A prepared relay message is generated as information-only.

  6. An audit record is created.

  7. The human reviews the relay package.

  8. ChatGPT receives the information for review only.

  9. The human decides any next action.

The relay should never imply that Codex, ChatGPT, or another AI has approved the next step.

Safety model

The relay should make these boundaries explicit:

  • Human remains the only approval source.

  • Codex reports are information only.

  • Validation results are information only.

  • Prepared relay messages are information only.

  • Audit records are information only.

  • ChatGPT receipt/review is information only.

  • No AI approves another AI’s work.

  • A relay message does not open new scope.

  • A ChatGPT response does not become approval.

  • Live relay, if supported, should require strict opt-in and final human confirmation.

Suggested product components

Possible components could include:

  • Workblock Cards

  • Current State files

  • Structured Codex Final Reports

  • Prepared Relay Messages

  • Audit Records

  • Human Review Checkpoints

  • Assistant Receipt Review

  • visible status labels such as PASS, ERROR, BLOCKED, HOLD_NO_ACTION, PASS_WITH_WARNING

  • stale-state warnings

  • source-of-truth fields

  • final confirmation prompts for any live relay

  • abort phrase / one-action stop for live relay

  • local/manual mode first

  • optional native live relay only after strict permissioning

Why this should not just be browser automation

Browser automation may prove that a handoff is possible, but it is not the best safety model.

A native interface would be safer because it could:

  • avoid reading chat history

  • avoid browsing or UI-state ambiguity

  • lock the exact message being transferred

  • show the exact destination

  • require final human confirmation

  • create an audit trail

  • prevent follow-up actions

  • preserve the distinction between information and authority

In other words, this should not be “Codex controls ChatGPT through Chrome”.

It should be “Codex prepares structured workblock information, the human approves the relay, and ChatGPT receives it as information only.”

Prototype / validation evidence

I built and tested a local/manual prototype of this workflow.

The local prototype included:

  • a report validator

  • a prepared-message builder

  • an audit-record writer

  • local validation outputs

  • manual transfer readiness records

  • assistant receipt review records

I tested multiple cases:

  • clean PASS cases

  • intentionally unsafe / ambiguous reports that were blocked

  • remediation after ERROR

  • larger multi-file report handling

  • stale/conflicting project-state handling with warnings

  • manual transfer and assistant receipt as information-only

Summary of validation classes:

  • Trial 001 original: ERROR

  • Trial 001 corrected addendum: PASS

  • Trial 002 clean workflow: PASS + manual transfer / assistant receipt

  • Trial 003 non-relay docs sanity: PASS

  • Trial 004 intentional unsafe/ambiguous report: ERROR / BLOCKED

  • Trial 005 remediation: PASS

  • Trial 006 larger multi-file report: PASS

  • Trial 007 stale/conflicting project-state: PASS_WITH_WARNING

Prototype tests recorded:

  • validator tests: 51 passed

  • combined local prototype tests: 63 passed

This is not a live product integration. It is local/manual evidence that the workflow model can distinguish clean handoffs, unsafe handoffs, remediated handoffs, larger reports, and stale/conflicting state while preserving human authority.

Important limitation

I did not execute a live Codex-to-ChatGPT relay.

That was intentional.

A manual relay is safe but still too manual. A browser-based relay proof would be fragile unless officially supported and permissioned. The remaining gap appears to be product-level: a native Codex ↔ ChatGPT handoff interface with explicit human approval and auditability.

So the conclusion is not “use Chrome automation”.

The conclusion is that a native relay surface would be safer and more useful.

Related UX suggestion

A smaller related improvement: when ChatGPT/Codex produces copied text blocks or generated .txt files, the file name should ideally be derived from the block title, heading, task ID, or batch ID instead of generic names like text.txt.

For workblock-heavy workflows, this would reduce file chaos and make later retrieval much easier.

Future extension

A future version could support bounded chain-work: multiple predefined workflow links inside a human-approved scope.

This should not mean unbounded agent autonomy.

Each link should still have:

  • a defined scope

  • a result

  • a review/closure point

  • an audit trail

  • a human authority boundary

Short version

Manual relay is safe but too manual.
Browser relay is fragile unless officially supported.
Native human-approved relay would be the product-grade solution.

Information may flow.
Authority must not flow.

I see a risk in automating it, it’s the risk on burning an incredible amount of token.

But I also agree that allowing GPT to have “hands” by beeing able to use codex would be good.

What I mean by that : I never hit the well on the chatGPT enveloppe. But I always hit the wall on codex token. Why ? Because it also does the thinking.

What I actually need is GPT to have a bigger potential context and do the thinking with the token burned from codex when he actually does codex related actions : read and writte in and out the files.

Currently we have codex that uses a version of GPT to think but burns token thinking when GPT does not. Isn’t it ironical ?

(And here I’m afraid that instead of improving they will actually reverse engineer the idea to make the system worse by putting token limits on GPT …)

That is a n ice to have feature. Can be solved now by implementing a custom MCP that would invoke Codex CLI. then Codex can dispatch notification, so you could ask your webchatgpt assistant to get the latest handoff and other required data to proceed with the flow.

The only issue - you’ll have to move your chatgpt to dev mode

I’m a little confused. Were you replying to the creator of the discussion or to me?

If you were replying to me, do you mean that is is actually feasible already? I don’t know what dev mode is.

I was replying to the creator :slight_smile:

Dev Mode is a mode in chatgt settings that allows adding custom built MCP servers(and probably other features). It reduces safety, as these tools might not always be safe. So there is a risk if you would use tooling that was built by somebody else.

Regarding the feature, the MCP can be either simple → just route prompt to Codex or extremely complex, it really depends. This post really gave me the idea. This was possible quite long time ago already, but I never thought of building it :smiley: Technically it can give your webgpt access to your pc via the MCP server and CodexCLI(or any other CLI tbh)

Important note: When I think more about it → that might be considered as abuse, exactly because there is no limits on webgpt usage. But simple routing wothout heavy automations seems like a sfe way to go.

Would be great to get a comment from openai staff on that.

PS: I didn’t read the whole feature description from the crator, as it was ai generated and there is no human summary. But just reading about the problem reminded me about my own pain point straight a way.

Thanks for the comments — and yes, fair point: my original post is long, so here is the shorter human summary.

The main idea is not to give ChatGPT or Codex unrestricted “hands”, and not to create an automation loop.

What I am trying to describe is a safer handoff layer between planning/review and repo-local Codex work.

In practice:

  • Codex finishes a bounded workblock.

  • Codex creates a structured report.

  • A relay layer checks whether that report is safe and clear enough to hand off.

  • A prepared message is created as information only.

  • An audit record is created.

  • The human reviews the handoff.

  • ChatGPT receives the information for review only.

  • The human remains the only one who can approve next actions.

The core rule is:

Information may flow.
Authority must not flow.

I agree that parts of this might be approximated with custom MCP tooling or Codex CLI integration. But that is exactly where I see the risk: local PC access, unclear permission boundaries, tool safety, token usage, and possible abuse concerns.

That is why I think a native product-level workflow would be safer than a workaround. It could provide explicit status labels, audit records, final human confirmation, no chat-reading by default, one-action stop rules, and a clear difference between “information was transferred” and “work was approved”.

So the proposal is less about giving AI more autonomy, and more about making existing ChatGPT/Codex workflows safer, clearer, and easier to audit.