Feature Request: Enhanced Hierarchical Memory with Granular Recall & Context-Aware Retrieval

:pushpin: Background

I’m a power-user of ChatGPT—over the last six months (Dec 2024–May 2025) I’ve run ~180 distinct sessions, averaging 9 prompts per session (~1,600 total). My workflows span multi-part YouTube scripts, modular e-learning courses, market analyses, and more. Each new session, I spend 2–3 minutes re-stating my preferred style (Indonesian language, “module+quiz” format, analogy-driven explanations), costing me 200+ minutes of repetitive work.


:construction: Problem Statement

  1. High-Level Only Memory:

    • ChatGPT’s long-term memory today stores only global preferences (language, tone, output format).
    • It does not persist session-specific details—custom vocabulary, project tags, unique examples—across sessions.
  2. Manual Re-Specification:

    • Users must re-declare persona, context, and formatting in each new chat, which is time-consuming and error-prone.
  3. Trade-Off Accepted:

    • A “total recall” approach was avoided to protect latency, but this leaves power-users repeating boilerplate instructions every time.

:bullseye: Desired Outcomes

  1. Total Recall + Selectivity:

    • Persist both macro-preferences (e.g. “use Bahasa Indonesia, module+quiz format”) and micro-details (e.g. “term X = custom definition”) automatically.
  2. Context-Aware Retrieval:

    • Dynamically retrieve only the top-k most relevant memories (weighing recency, explicit tags, and semantic similarity) to inject into each response.
  3. Performance Preservation:

    • Maintain overall < 100 ms additional end-to-end latency for memory retrieval.

:hammer_and_wrench: Proposed Architecture

Tier Contents & Mechanics
Tier 1: Global Preferences Primary language, default output style (e.g. “module+quiz,” “analogy”), default persona roles
Tier 2: Project-Scoped Memory Conversations grouped by explicit user tags or inferred clusters (e.g. “YouTube Psikologi Deduktif Series”). Pointers to session summaries & key vocab.
Tier 3: Session Detail Cache During active session, index new keywords, unique examples, and user corrections. At session end, auto-summarize into concise bullet points.
Embedding-Based Retrieval Compute vector embeddings of incoming prompt + memory chunks; retrieve top-k matches prioritized by recency and user-starred status.
User-Driven Tagging UI “Star” or “pin” any chat snippet for permanent memory; memory dashboard for reviewing, editing, pruning stored items.
Automated Pruning & Summaries Periodically convert older session caches into high-level summaries and archive/discard raw transcripts to control storage growth.

:white_check_mark: Benefits & Impact

  • Zero-Repetition Workflow: Eliminates manual restatement of style/context, cutting prompt length by ~50% and saving minutes per session.
  • Consistent Voice Across Sessions: Ensures uniform tone and structure for serialized content (e-learning modules, multi-part articles).
  • Scalable Performance: Embedding retrieval + pruning prevents storage bloat and keeps lookup latency predictable.
  • Competitive Edge: Fine-grained long-term memory will differentiate ChatGPT from other LLM interfaces lacking seamless, user-customizable recall.

:rocket: Call to Action

  1. Prototype Hierarchical Memory under an internal beta.
  2. Solicit Power-User Feedback (e.g. through a dedicated “Memory Beta” program) to fine-tune tagging UI and retrieval thresholds.
  3. Measure Success: Track prompt-length reduction, user satisfaction, and latency impact in real-world trials.

I’m eager to collaborate on design discussions, provide additional use-case logs, and participate in any beta testing. Thank you for considering this enhancement to make ChatGPT truly “remember” what matters most.