I have the impression that there’s some context leakage happening between API calls.
I’m developing an chat application where I manually manage the conversation history, system prompts, and contextual instructions. In different API calls, I modify some of the prompt instructions.
For example, in one call I might include a line saying that to solve a problem the user should click on a gear icon. In another call, I remove any mention of the gear icon entirely — yet the model’s response in that second call still refers to a gear or asks if there is one.
This makes me suspect that context might be leaking between API calls, even though I intend to keep each session completely isolated and have fine-grained control over the context.
Has anyone else experienced this issue? My impression is that this kind of behavior didn’t happen with previous models.
AI models have no history that you don’t supply yourself, or that aren’t persisted by one of the newer server-side chat state mechanisms in Assistants or Responses.
It always comes down to code errors, or simply the AI being convincing in writing language that appears suitable for a purpose and fooling you.
It even gets as silly as reports, “it’s giving me other people’s conversations”. (no, the AI is just predicting language).
The quoted part is what is important to note.
The AI can observe the full conversation, including what it produced before that you send again.
The assistant saying something once in a turn you pass again is enough for it to continue on the idea, even if you remove the inputs. Models have skill on continuing a conversation trimmed for length, without complaining about missing turns, and also without complaining about a discontinuous context where ‘assistant’ seemed to be replying to nobody (and a symptom with gpt-5 an other reasoning models, can’t keep who-said-what straight).
Plus, the answer might always be “gear icon” anyway, just like o3 model’s answer is always, “let’s fill your code’s strings with emoji”.
So I would observe the full context of what is actually being sent, and whether you are using threads, conversation id, previous response id, or similar mechanism that does persist a memory for you.
Thank you for taking the time to explain all that.
I understand the concepts you mentioned and the reasoning behind managing the conversation this way — in my case, the goal is precisely not to use any conversation_id, thread_id, or previous_response_id.
Before opening the post, I made sure to carefully review all the instructions and behaviors to confirm it wasn’t an unreasonable situation.
That said, it’s not common in my current context for a response to depend on something like “clicking on gear icons,” so that part felt a bit out of place.
You didn’t really say how much of a previous conversation you are retaining when editing, but there is another underlying mechanism at work also, the fact that sequences are based on embeddings, and the activated embeddings spaces are based on sequences. There might be a completely unperceived ‘understanding’ in a sequence of tokens that puts the AI in a cog-like thinking space.
This image sourced from an Anthropic paper gives food for thought: what’s really going on in the underlying layers?