[Feature Request] Add “Semantic Clipboard” for Exact Phrase Recall in Short Conversations

Title: Add “Semantic Clipboard” for Exact Phrase Recall in Short Conversations
Short-term phrase recall for high-precision workflows—no hallucination, no paraphrasing.

Category: Core UX / Engagement Precision

Summary:

ChatGPT consistently fails to reuse exact phrases it just generated, even in short, 2–3 message conversations. This breaks workflows for users who need precise text manipulation, iteration, or recombination—especially in editing, scripting, UX writing, legal drafting, or any domain where verbatim continuity matters.

What’s needed is a Semantic Clipboard: a lightweight, ephemeral memory layer that allows verbatim recall and reuse of recent strings on command, without paraphrasing or mutation.

The Problem:

Users frequently request: “Now combine that with this,” or “Repeat the sentence exactly.” ChatGPT almost always paraphrases or subtly alters what it just wrote—even seconds later.

This undermines high-precision workflows and forces manual copy-paste intervention. And it makes the assistant feel imprecise and unreliable in tasks where high-fidelity workflows matters most.

Proposed Fix: Semantic Clipboard Layer

• Maintain a short-term buffer of the last 4–5 user and assistant messages, verbatim.

• Allow reference via simple phrasing, such as:

“Reuse that sentence exactly”
“Use your last line verbatim”
“Repeat what you just said, unchanged”
“Combine that exact line with the one before”

• Re-inject stored strings at prompt-assembly time to ensure literal reuse without mutation.

• Keep the feature ephemeral, invisible by default, but responsive to precision prompts.

This functionality would enable verbatim phrase reuse across recent prompts, restoring short-term conversational fidelity without hallucination or paraphrasing.

It supports editing, scripting, legal drafting, and precision workflows, where users expect reliable continuity. This is not about long-term memory—it’s about fixing a basic, high-friction short-term failure that breaks flow and user confidence.

Why This Matters:

The memory load remains minimalwell under 100 KB per session, even with 4–5 message pairs.

No model retraining needed—just pre-generation logic and basic token recall indexing.

• Users naturally work in short memory spans when combining outputs—this reflects real behavior.

• Fixing this unlocks higher-value workflows instantly, especially for serious users.

Technical Feasibility – Independently Confirmed:

A review of OpenAI’s architecture and industry best practices confirms this fix is both possible and lightweight:

  1. Prompt Preprocessing & Token Injection:

OpenAI’s infrastructure already supports system-level prompt prep and message injection. These mechanisms can be extended to include recent user or assistant strings for verbatim reuse.

  1. Short-Term Message Buffers:

Storing the last 4–5 user and assistant messages is consistent with established techniques used in frameworks like LangChain, AutoGPT, and other conversational memory pipelines. It requires no long-term persistence.

  1. No Core Model Change Required:

This feature can be deployed entirely at the infrastructure and prompt orchestration layer, not at the transformer level. Models already replicate exact text when prompted precisely—they just need help recalling what to reuse.

  1. Minimal Resource Cost:

Even with a billion users, ephemeral buffers would consume only ~100 TB at peak concurrency, and far less in real usage. The cost is insignificant compared to what’s already allocated for context processing.

Feasibility Note:

The assistant itself is 90% confident this feature is viable now using:

  1. Ephemeral buffer of last 4–5 message’s
  2. Token-level prompt injection
  3. Simple syntax resolution or context flagging

This does not appear as a core-model limitation—it’s hopefully a quickly solvable, infrastructure-level enhancement.

This is about precision control. And it’s one of the lowest-effort, highest-impact UX upgrades OpenAI could deploy—immediately and at scale.

.