Hi, I’m using ChatGPT to manage two long-running production workrooms
(Japan Shorts / US Shorts for YouTube script creation).
After several weeks of usage, both rooms became extremely slow.
Symptoms
- The more structured workflow + message history accumulates,
the slower the response becomes. - At first it was a few seconds delay (3–10 sec).
- But today, each reply took OVER 10 minutes.
This is not typical latency — the model freezes as if it is
re-scanning or reconstructing the entire conversation context
before answering.
Why I believe this is a bug
It feels like the system is repeatedly “rehydrating” all past messages,
instead of compressing / anchoring / trimming old context.
So the context buffer becomes extremely heavy and the room becomes
nearly unusable for long-term projects.
I am aware of “fast mode” (no web, no images, short),
but that only controls output.
The problem is clearly internal context size, not rendering.
What I need (clarification request)
Is there a recommended method or internal setting for:
-
Auto-pruning / compressing chat history
while preserving the fixed rules / workflow instructions? -
Anchoring system-level instructions
so the model doesn’t keep rebuilding them on every turn? -
Long-running project mode / context checkpoint
where a large chat doesn’t degrade over time?
Why restarting a new chat is not a good workaround
These are persistent production rooms, not casual chats.
If I reset the room, I lose all the accumulated workflow logic,
style patterns, and prior refinements.
I need a solution that allows:
“Long-running rooms + stable performance + anchored instructions.”
If there is a developer-side switch, option, or recommended workflow
to solve this (context trim / anchor mode / lightweight mode),
please advise.
Thank you.