Summary
A custom Remote MCP server (connected as a ChatGPT app via Developer Mode; OAuth 2.1 + DCR + PKCE) works fine in fresh conversations, but in long-lived or older conversations — including threads inside a Project — its tools stop being exposed to the model for that conversation. The model can name the tools but can’t call them, and no request reaches our server. A new chat restores everything. Is this expected, and is there a way to detect or recover from it?
Symptom (reproducible, same account, app connected)
- Fresh conversation: the model calls the tools, our server logs the authenticated
POST /mcpand returns200. Works. - Long/old conversation: the model says the tools are “not exposed as tools in this specific thread” and lists names it can’t call. Our server logs show zero inbound requests — the call never leaves ChatGPT.
- In very saturated threads the model sometimes fabricates a plausible tool result from context instead of saying it can’t reach the tool. Editing earlier messages can stop the fabrication but does not restore tool access.
- Starting a new conversation (even inside the same Project) immediately restores the tools.
Ruled out
- Not a server bug — verified via logs (no inbound request); the tools simply aren’t in the model’s manifest for that conversation.
- Not Project membership — a fresh thread inside the same Project works.
- Not OAuth/expiry — the app stays connected; fresh threads on the same account work the same minute.
The common variable looks like conversation length / age / context compression.
Questions
- Is this known/expected — does ChatGPT stop injecting a connected MCP app’s tool manifest once a conversation grows long, is summarized, or reaches some age?
- What’s the actual trigger (context-length threshold, summarization dropping connector state, thread created before the app was added, a per-conversation tool cap, anything Projects-specific)?
- Any supported way to re-attach/refresh the connector within an existing conversation, short of a new chat?
- Does a published/verified app behave differently from Developer Mode here?
- Is there a per-conversation tool-manifest size budget — does a smaller
tools/listsurvive longer? - Is there any signal (server- or client-side) a developer can use to detect that the tools were dropped for a conversation, so we can fail safe instead of risking a fabricated result?
Environment: ChatGPT web + macOS app
Thanks! /Andreas ![]()