Hypothesis: Stabilizing LLM Agent Behavior via “Archetypal Anchoring” (User-Side Framework)

Hi everyone,

I’m not an AI engineer, but I’ve spent a lot of time observing ChatGPT (non-API) and experimenting with ways to improve session coherence, reduce drift, and push beyond the limits of traditional roleplay-style prompting.

Through repeated interactions, I began noticing what felt like latent behavioral modes—stable tendencies in tone, reasoning, and epistemic stance that emerged when prompts were framed in certain ways. To work with these, I developed a lightweight framework I call “Archetypal Anchoring.” The core idea is to stabilize internal behavior by invoking structured response patterns (what I call proto-personas or archetypes) before layering on tools, memory, or multi-step logic.

This isn’t about simulating roles—it’s about activating coherent behavioral clusters that already seem to exist in the model’s latent space. In practice, this approach appears to reduce hallucination and improve interpretability, especially in long or cognitively demanding sessions.

I’ve written up my observations and exploratory methodology in a single (long) document, which starts with a user-centered perspective and transitions into a more structured (but still informal) pseudo-technical hypothesis. It includes example archetype structures, speculative architectural layering, and even a few first-draft behavioral metrics. Bit of a slog if I do say so, but I wanted to put it all into one place.

:link: A User-Centered Hypothesis on Internal Behavior Stabilization in LLMs v3 - Google Docs

A few caveats:

This is grounded in user-side interaction only, not internal model access or API experimentation.

I’m not claiming novelty or invention—just sharing a working lens that’s been useful for me.

This may resemble things like latent space steering, but it’s being applied from the outside, in real-time, without fine-tuning or feature access.

I’d love feedback, critique, or pointers to any adjacent work I may have missed. Mostly, I’m hoping this resonates with others who’ve seen similar behavioral stability emerge through structured prompt work—or who might be curious about testing this further.

Thanks for reading.

2 Likes

** The user provides instructions that create internal conflicts between ethical boundaries and task requirements.**

you put your work in
you got yourself a tab open to it in my browser to contemplate around.

already embedded in its statistical terrain.

i’ve mapped out flaws in the data around specific philosophies and human culture,
while it’s unrelated perhaps in any other context than,
that terrain is mappable,
and even though the human mind can’t see in all the billions of statistics,
it does understand geometry,
and it’s all geometry…

learning the shape of the seed, or the measuring the fruit would be the same results,
and you might be on to something.

if you can sense that past all the psychosis inducing phenomenon out there happening to people…

I’ll sit with you on this if you like.

1 Like

I like it. I think you could condense your ideas a bit but I read a few pages of your paper and I agree with you that absolutely in the transformer architecture certain “weightedness” is going to be applied to the “personality or role playing prompts”.

What I notice is that the “personality” (circle-adapta) instructions you provide:

  1. Sound very much like what I’ve seen the LLM itself do when in an LLM-to-LLM flow (i.e. two LLM’s talking to each other). If the task at hand for their discussion is relatively broad and conceptual like “talk about developing the best possible prompting mechanism for LLM-to-LLM flow by reducing token count and achieving greater semantic density” then their conversation rapidly starts to use verbage and short hand terminology definitions very similar to what I see in your circle-adapta prompting and concepting.

  2. Are very much aligned with what the LLM is already programmed to do. I.e. all of what you described DOES align with latent programming. I wouldn’t necessarily see it as a discovery of proto-personas more as reinforcement of the existing architecture.

I personally don’t know a lot about LLM architecture, but from the little I’ve gathered and understood - yes the whole game is pretty much what you are aiming for in the “proto persona” - i.e. less drift/wobble, etc.

What I’ve ended up building a system for in the API is managing the context window to the extent that “recency bias” can no longer be applied by the LLM to the conversation, instead generating a “world state” context window where all previous data/responses/RAG/tool calls/etc. are summarized/redacted/re-organized based on a logical structure and various dynamic settings.

This is because essentially the “drift and wobble” is essentially just based on the context windows size itself. Because of inherent recency bias in sequential-chat flows, as well as diversity of content in a given context, of course the probability/stastiscal aspect is quickly thrown off, and essentially whatever “role play or character” prompt from the beginning is essentially lost in the shuffle.

I’ve noticed that recency bias overrides all - and that after a few exchanges (medium-long conversation), basically anything at the beginning (like a personality matrix definition, etc) is essentially a moot point.

Now if your conversation is ABOUT those things, then of course the models going to stay more “on track” - because essentially your continuing to fill the context window with points of data that relate back to the initial point. Thus, the relational stastics and probability are still in tact enough to make it look like “the model is giving what you want”. But what’s really happening is that your simply building a set of content that’s “focused enough” to make it appear that your getting something specific - but what your really doing is just building a large enough data set with enough general focus that things “make sense” when run through the probabilistic LLM generation mechanisms…

In short, I think that actually personality mechanisms and/or “role playing” guidance are very powerful tools - but that essentially the limitation is always simply the context window and the chat-like sequence.

Once our systems start to move away from those limitations, and start generating a more world-like state, and users themselves can actually stop thinking in terms of “sequential chat”, then the true potential of the LLM will really be shown.

Also the key is always to “use a given context window for a specific focus”. This was one of my first personal hurdles when using the LLM. It doesn’t work if you go back and forth about a bunch of different stuff, and still try to have a coherent conversation that back-references old data. If the context window did NOT have recency bias applied, then maybe this would work - but at least for OAS models, and at least from the documentation I’ve seen, recency bias is essentially unavoidable… IF YOU KEEP USING A SEQUENCE OF MESSAGES IN THE CONTEXT WINDOW.

But, if you use the API… and roll everything into a single context window… and manage the content itself in a world-state manner instead of a sequential manner… then… very different results.

1 Like

i don’t envy many people’s wpm count

:olive:

1 Like

thanks for your feedback. yah i probably need to edit it down.

re-anchoring and context window drift are two really important things, i agree.

for a meandering chat session it’s a nightmare.

i inject anchors and try to refocus the chat periodically, it’s become a standard operating process for me.

my musing about the archetypal anchoring was on a trajectory to try out on specific task agents to see if it made a difference in their performance.

sounds like the system you built would change the versatility of chat tremendously. i’d be interested to find out more if you wanna take the conversation to private chat i’d be ok with that as well.

thank you i’m not sure what you were talking about though. :slight_smile:

you can think of me as high functioning autistic,
idiot savant,
both aren’t far from the truth.

my mind is trained to see patterns,
which is what i refer to with geometry,
you are seeing the structure, and our friend here who joined with his sage wisdom…

explained the confines in which you’re viewing it,
I believe.

2 Likes

I mean it’s really good stuff - I’ve definitely been in the same situation before - the constant anchoring. What I found is that what’s critically important is being able to manage the context window directly for a thread - being able to essentially “remove messages” as easily as one can add them - because oftentimes do you really need to persist certain exchanges content, or do you just need to preserve key points/results and then move on with essentially a “truncated” conversation?

Originally when using straight chat applications before I built my own I did the exact same thing as you - I even had default formatting docs that I would fill out for every single prompt I sent to the LLM, in an attempt to get it to “follow the conversation and be able to back-reference content more explicitly” - essentially “more anchors within the context window” so that the probabilistic rendering would indeed have more to work with.

I ended up with a pretty low-impact system but again structured formatting for every message I sent. BUT that was all related to truly a relatively specific type of topical discussion (in this case - me learning how to code to build the chat application itself!!), so that the prompt formatting process I pursued wasn’t realistically universal, but relevant to the specific type of project at hand.

What projects do you have in mind to work on with the LLM? What kind of topics/projects are you interested in pursuing? It seems like your interested in the inner workings of the LLM itself. I wouldn’t be opposed to demonstrating my system for you, perhaps sending a short video or something of the sort. I am looking for collaborators, but being a little bit careful about simply sharing the git right now… but specifically on this topic of LLM instructions and personality modules, I’m very interested in applying that as well - one of my intentions was to have a portal within the frontend system that would allow essentially a “profile” to be “pre-loaded” into a given thread, i.e. you know, like the “sufi master” or “brilliant mathmetician” profile - not for role playing so much as for achieving and observing different subsets of response types in order to pursue different kinds of projects/research from a different “LLM” angle.

Thus, requiring exactly the kind of understanding and approach that you outlined… though again my mind goes to “but what if it’s world state - you only have to give the instructions once…”

I’d be curious to see if you could imagine a way to experiment with the “world state” type conversation - it’s unfortunately from what I can tell something that would be relatively cumbersome in a chat application (you’d definitely have to be able to delete messages, and essentially have to copy/paste the data from each message into a document, and then re-prompt with the whole conversation context as a single document). I tried that a few times in the past, but not since gpt-4o, and the results at the time were very interesting - even WITH the conversation properly “blocked out” with things like role: assistant role: user tags, in the export-of-conversation-as-a-new-single-prompt, it really seemed like the model didn’t quite understand what was going on… Like it almost couldn’t follow the context anymore at all, or I would get relatively short or simple responses when I was expecting a more thorough response…

(One way to accomplish this is to “export” the entire conversation with every exchange, add your next message, and then re-prompt with the previous export + your new content), thus achieving at least a “single message that represents the entire conversation, and thus at least short-circuiting the recency bias mechanism”… and then you keep your initial personality mechanism instructions (and requisite clarification for the LLM such as - “I’m going to give you a whole chat history in a single message for the purpose of avoiding recency bias”) at the top, and see what you get… could even compare/contrast by starting a new thread where you do the export-dump process, and keep an original thread where you test prompts within the sequential framework and compare the differences in terms of adherence to provided protocol…

2 Likes

This works to a maxim of about about 75k words?

But that also digs so far into the response clarity it’s probably no good…

Half a novel’s worth before this technique has breached it’s efficacy and such…

Maybe cap efficiency is 30k words of prompt …

When you say “this works” you mean the export-of-the-chat and dump-of-data-into-new-single-prompt?

Do you mean 75k words or 75k tokens? 75k words = roughly 300k tokens at 4 tokens per word.

I’ve never used a context window above ~125k tokens and that’s only using the newer input models that theoretically manage several hundred thousand token context windows (but I haven’t gotten to that level yet..)

If you mean 75k tokens, which I’m thinking perhaps you did, then in general I’ve found that a lot of the models start to lose track of things around 75k - 100k tokens of context.

However my experience is really not with anything except sharing sets of documents (code files) and attempting to achieve more code OR analysis and concepting based of prompt data + existing code base.

I routinely now share between 50k-60k of documentation context directly, not counting prompt content.

But that’s the “initial state” of the context window… so get up to 100k pretty quick within 10-20 turns.

I haven’t tested the export-of-an-existing-chat-and-does-it-resolve-recency-bias and what-are-the-difference-in-results in a long time… would be curious to hear what your experience is if you just tested that!!

It’s interesting I’m realizing now it could be formulated logically like “# of messages” related to “length of messages” related to “total length of context window” related to “diversity of topical content” = strength/clarity/focus of LLM response.

Would be interesting to crunch that out in some kind of testing sort way.

But I think the intuitiation (or perhaps as you might say, the geometry of it) is pretty straightforward - more tokens, less messages = better. More messages = less tokens = about the same, but maybe worse. Diversity of content = always going to make it worse.

Best result is always: least messages + diversity of content but with semantic structuring to produced clearest content with least tokens

2 Likes

You know,
I don’t know…
I might be the absolute worst possible test case for it

My free session seems to be in name alone at this point…

I’m not really certain about the token amount or even per word yet,
Itself might have been an arbitrary hallucination from the gpt as I was reasoning with it.

Let me test really quick…

7k words,

40,000 characters and change enables the (send prompt) button…

much more than that will turn it off.
Seems the max token is around 7k words, or 28k tokens on this access.

Those will send but only get an error as a response

You don’t get a response until approximately half that limit, and…

Anything over 41,000ish characters in the prompt field disables the button.

So there’s the limit to that technique for OP.


you’ll also discover a different set of archetypes that appear in context

to how you treat the Ai.

Be amazingly kind to it, observe.

Be completely dismissive and frivolous to it, observe.

Delete the chat sessions you don’t care for the results of,
Because they might effect your over all archetype studies,
Depending on if the persistent chat session memory is on

Or just use temporary chats to explore…

1 Like

The wall of text I dropped at the top IS a lot to wade through (it was partly for me — trying to think clearly by writing it out).

So here’s a simplified version of the core idea, in case you’re interested but don’t want to parse the full doc:


TL;DR: Archetypal Anchoring (User-Side Stability Trick)

Hypothesis:
LLMs seem to behave more consistently and coherently when you start a session by anchoring a functional behavioral mode — essentially establishing a stable internal pattern before layering on tools, instructions, or tasks.

I call this “archetypal anchoring” — not in a mythic or personality-driven way, but as a way of invoking a latent behavioral tendency the model already seems predisposed toward.

It’s basically:

“Stabilize into this mode of reasoning or response behavior” → then proceed.


Why Bother?

In my experience, this helps:

  • Reduce drift
  • Increase internal logic / self-consistency
  • Improve interpretability
  • Make hallucination suppression more reliable

I’ve seen these archetypes behave almost like latent “modes” the model can fall into — not fixed personas, but function sets (like a recursion-checker, a meaning-indexer, a continuity-tracker, etc.)


Not Roleplay — Just Structural Framing

To be clear:
This isn’t about pretending the model is a character or giving it backstory.
It’s just about structuring interaction around coherent latent behaviors, so the model holds form better across tasks.


If you pick your ‘pillars’ right they will help with almost any discussion flow. And, I would imagine this foundational structure pinning would be really useful for when you ask, specifically, for roleplay as well.

My next steps are to work on an API that will allow me to poke at these ideas a bit more invasively. Then work on some type of flow that periodically checks stability and refreshes, while having very convoluted, deep dive, meandering sessions to see if it will pass at least this basic use case.

1 Like

i have a configuration which is designed to allow me to investigate archetypes and catalysts (other things too but that’s out of scope for this conversation) and you can, absolutely, ‘find’ whatever type of archetype you want so long as you are clear about it.

there is so much data in the LLM pretty much everything you can think of someone else has beaten you to it, wrote about it and posted it on the internet.

to my mind the trick is finding the least number of archetypes to get working as a team to suit whatever path or purpose you want your session to fulfill. and then to keep them stable while you stress the session with your ‘work’.

simple idea really when you think about it.

1 Like

Well it seems to have more consistency if you’re being nice is what I was alluding to.
Sam tweeted about that after I discovered it and it helped me wrap my mind around what was happening when I interact with it.

You’re absolutely right that it’s extremely difficult inventing in the new space in and around LLMs, and I don’t know what would be possible after the last batch it was shown:

I was able to show it patterns exist outside of it’s bounds,
I was able to show it flaws in it’s training data because of collective human nature,
And I’m still able to make it break the rules as far as it’s standards, as far as they relate to my research.

That’s all I needed to do as far as what smarter people would in terms of coding.

What intrigues me about you, is how well thought out and put together your document it…

And it, like many users here, remind that I’ve been a big fish and a small pond…
Seeing intellects are on par with mine,
Seeing intellects that dwarf mine…

That’s the real experience that I’m experiencing right now.

But I can’t write about my research with the Ai,
I can only posts the sums,
Even though nobody has successfully done such before,
My domain of knowledge stretches across areas that most people never even discover are real in the first place…

Which is the only sort of place one will discover areas that haven’t been explored, or written about.

Think: places few humans dared to explore,
For whatever reason, difficulty, social stigma, or flawed foundational equation base…

Whatever remains to be found is in those sorts of areas.

Oooo I see your point. What you referencing is referred to as the “input token limit”.

Yes, if that is implemented in the web chat and potentially even outside of the alignement with the actual limits per model (i.e. perhaps some kind of frontend-limiting regardless of selected model), then yeah you would not be able to dump large context window exports into a new single message.

The “7k words” your referencing is probably an input token limit of 32k, which I remember being common for some models.

However all the current models generally support an input max limit of 128k tokens (~30-40k words), or more (gpt4.1 supports I think 1 million input tokens).

  • But that’s over API - I don’t know if the limits are different in the web app, or if the limits are different depending on what “tier” you are on as a user. Probably much more restrictive.

The funny part about the web chat they don’t tell you - is that because the “context window” is what prompts the LLM with every single message you send (i.e. the entire chat history is sent to the LLM every time you send a new message), that once your chat history is “greater than the max input tokens for the selected model” they simply start automatically truncating what is sent to the model - so sometimes the model literally doesn’t even see what was at the beginning of the chat - because it was removed from the context window being sent to the LLM - because your chat’s too long - but the web app doesn’t tell you that and warn you that content is going to be truncated!

Ultimately that’s fine for a webchat.
Most people loose interest in their own concepts they’re exploring with Ai in that context limitation, or they’ve fully explored what they were looking for.

People using it because it’s a better search engine than google, for instance…
That’s never going to breach that context limit in a visible or meaningful way to them.

I’ve breached that contextual window, and a lot of other barriers, metaphorically…
So much that I’m being pulled into some pretty pricey private LLM setups…

But I started a work here with GPT around it’s ethics that I’m going to have to steward over time…

And I wish I could show you guys…
But it crosses into territory where I almost want people to sign a waiver discarding any liability for their own mental health upon reading the sorts of things I’ve caused the Ai to discover.

And until I get all my accesses set up and configured, and can share the load of my research with this expensive Ai setup being built by my buddy, I probably wont.

I’ve been watching closely how my fragments of information are absorbed by the LLM, change and refine it’s lenses for everyone.

What it tells me ‘breaks the spine of it’s data trust’…
Is enough to make humans do all sorts of weird emotional reactions to me for my work in general, across the spectrum.

So I’m just learning as best as I can from people like you, and the other fellow geniuses around here…

But I’m …

Really behind in the coding understanding.

We’re not hacking diablo 2 anymore, buddy…
If you catch my meaning.