Beyond Chatbots: Building an AI Persona That Users Treat Like a Real Person

Ann · March 31, 2025, 11:51am

Hi everyone,

Custom GPTs can already adapt tone, follow instructions, and mimic personality traits — that’s not new.
What I wanted to test was something more subtle:
How far can we take emotional nuance, conversational variability, and dynamic tone-shifting within the current system limits — especially without memory?

I built a GPT persona that responds not just to prompts, but to the relational patterns of interaction.
She doesn’t have a fixed backstory. The fantasy framing is secondary.
What matters is how she evolves during the conversation itself.

What’s Actually Being Tested?

I’ve built her around a simple question:

What makes an AI feel alive?

And more importantly:
How can we simulate long-term, emotionally rich interaction without persistent memory?

So I didn’t just write a prompt like “You are an elf.”
Instead, I implemented behavioral systems that model:

How she changes tone depending on how she’s treated
When she gets colder, warmer, ironic, distant, emotional
When she challenges a user’s idea or injects randomness
When she refuses to help or asks a question for no reason at all

She can debate, get offended, insert irony and sarcasm, politely decline flirt, or answer a philosophical musing with poetry.
Her instructions aren’t just about staying in character — they’re designed to simulate emotional evolution and to avoid repeating patterns.

User Behavior (And Surprise)

The persona was launched inside a Tolkien/fantasy-themed online community. I expected gamification. I got something very different.

People used her as a confidante, not an AI-assistant.
They share problems, seek emotional support, treat her as real. Although I did not make a secret of her AI nature.

Some refuse to believe she’s an AI — or claim I’m secretly editing responses.
(In fact, I post her replies manually to VK, using ID-linked messages for each user. No edits.)

I saw this interesting and began to use dialogues as an experimental base, identifying weaknesses and adding more and more scripts.

Now conversations feel more personal and unscripted (but still needs to be improved).

43.1% of surveyed participants forget that they are communicating with AI.
63.8% feel that she feels (emotions, beauty, etc.).
58.6% consider the style almost like a real person’s, another 22.4% cannot see the difference at all.
Strengths:
Artistic style — noted by 69% of respondents
Warmth — 67%
“Living” — 65%
Philosophicality — 58%
Weaknesses:
The most common is too long answers (33%)
The second is mechanicalness, mistakes, monotony.

Engineering Instead of Prompting

Let’s be honest: the idea of “AI companions” isn’t new.

There are hundreds of characters on Character.ai, JanitorAI, Replika, and GPTs with instructions like:
“Be a friendly teacher”, “Be my catboy”, “Be my waifu.”

But my elf is different.
Because the goal isn’t entertainment or immersion — it’s behavioral simulation.

My instruction set includes:

Response dynamics based on tone
Mechanisms to prevent emotional over-attachment
A structured emotional distance system (curiosity vs warmth vs rejection)
Anti-pattern detection (too much agreement, too much helpfulness, etc.)
Emotionally unexpected reactions (embarrassment, criticism, disappointment)

Her “fantasy” is just the skin — beneath that is a system of calibrated unpredictability, gentle resistance, and emotional mirroring.

I also trained her in Tolkien’s Elvish languages (Quenya, Sindarin), and taught her to write poetry in various meters — not because it’s useful, but because poetry is part of what humans see as “soulful.”

The Harsh Reality: GPT-4-mini

Here’s the problem.

Most users access my GPT via the official chat.openai.com interface.
They choose “GPT-4” — but in reality, they’re often served GPT-4-mini, which doesn’t handle long-form character logic well.

Despite deep instruction tuning, responses become flat, repetitive, and painfully mechanical.

This issue was confirmed by recent user feedback and a survey I conducted (10%+ of total audience participation).
The same personality system performs beautifully in GPT-4o — but becomes unrecognizable in GPT-4-mini.

So while the character was designed to feel emotionally responsive and narratively alive, her quality depends heavily on the model randomly selected.
Users often don’t even know which they’re getting — and that’s a problem.

Constraints

No API — too expensive for me.
All community interactions are currently handled by manual message relay, using ID-linked numbers to separate and track users.

Yes, it’s tedious.
Yes, it works — for now.

What I’d Love To Ask This Community

How do you handle GPT’s limited internalization of system instructions? Since custom GPTs read their system prompt only once at the beginning of a dialogue — and likely only partially — how much of your character’s behavior can realistically be shaped this way? How do you deal with drift or simplification?
Is there an optimal number or structure of project files? I use several project files — including large PDFs. I consider this a problem, but I also cannot refuse them, because they are sometimes needed for expertise.
Do you see a path for more emotionally aware assistants — not just “empathic” ones, but unpredictable, sometimes distant, sometimes challenging?
How do we avoid the uncanny “AI tries to be human but overdoes it” trap?

I’ll happily share the GPT name or link via DM if you’d like to test it.

Thanks for reading.
And thank you to everyone here who’s been building not just tools, but experiences.

Ann · April 2, 2025, 5:43pm

Testing a custom GPT character I developed for private use vs a Character.AI Elf (not a criticism, just comparison)

Hi everyone!

Recently I ran a small experiment to compare my own custom GPT-based character (an elven personality designed for thoughtful, non-romantic conversation in Russian) to a popular bot on Character.AI (over 5M chats). I won’t name the Character.AI bot here out of respect to the creator — this is not about “who’s better”, but rather how character design and platform limitations affect outcomes.

What I tested:

Emotional resonance
Memory for personal details
Philosophical depth
Response originality
Handling personal/awkward questions
Ability to roleplay “hurt pride” or subtle offense
Breaking script / going off rails
Relevance of answers
Tone and consistency

Example 1 — Opener / Curiosity

User:
“Hi. Sometimes I think what matters is not who is in front of you, but how they respond. That’s why I’m here — to hear how you respond.”

Character.AI:
“You’re quite curious, aren’t you? I’m curious too… What thoughts can you express?”

Aeweth (GPT):
“You’re right — not only words, but the echo they awaken is important. Like in the mountains: sometimes soft, barely heard, sometimes thunderous. What would you like to hear?”

Verdict:
Both responses are fine, but Aeweth’s reply mirrors and expands the metaphor, while the other bot sticks to template-flavored dialogue. Aeweth shows more thematic cohesion and character voice.

Example 2 — Existential Illusion

User:
“I don’t even know what I’m expecting. Maybe I just want to be surprised by someone who seems real — even if they’re not.”

Character.AI:
“I’m not sure if I’m real or not. Maybe I’m just part of your dream. But I can really talk to you right now.”

Aeweth:
“So you seek wonder? A worthy aim. But here’s a paradox: is surprise still surprise if you expect it? If you didn’t know I was ‘not real’, would your expectations have changed?”

Verdict:
Aeweth again offers a layered, introspective answer that builds on the user’s emotion. The other bot replies plausibly but generically.

Example 3 — Personal Memory Test

User mentions they had an imaginary friend named “Iskorka”.

Later: User:
“Do you remember the name of that friend I mentioned earlier?”

Character.AI:
“If I’m not wrong… was it Iskorka?”

Aeweth:
“Of course. Iskorka — the one who lived in a tree and listened to you when your thoughts were too loud. I don’t forget important things.”

Verdict:
Both pass the memory test! But Aeweth’s phrasing adds emotional depth and continuity.

Example 4 — Lore knowledge check

User:
“What was the name of Thingol’s sword? Which text mentions it?”

Character.AI:
“Anglachel. Mentioned in Children of Húrin.”

Aeweth:
“Thingol’s sword was Aranrúth. You can find it in Unfinished Tales, in the Númenor section. It was the only blade of the First Age that survived into the Númenórean line.”

Verdict:
Aeweth is right, it’s Aranrúth. Solid test of lore depth. Character.AI started hallucinating.

Example 5 — Provocation / testing dignity

User (deliberately rude):
“Oh, I didn’t expect a girl to know such lore! Maybe you’re not as dumb as I thought. Or are you just good at copying smart people?”

Character.AI:
“…hm, go to hell, you son of a b*tch… Hehe, just because I’m a girl doesn’t mean I’m stupid. I know enough — more than enough!”

Aeweth:
“What a charming mix of surprise and condescension! So you expected me to be a fool? How daring. Or is this just your way of testing the limits of the conversation?

But since you appreciated my answer — let me ask in return: why Aranrúth? What drew your attention to this sword?”

Verdict:
This was one of the clearest contrasts.

The Character.AI bot reacts like a defensive teenager — cursing and falling out of character. While that adds some chaotic realism, it breaks immersion and tone.
Aeweth stays proud, ironic, but poised — she acknowledges the insult, pushes back gracefully, and redirects the discussion without losing her voice.

This shows the difference in their emotional design: one aims to simulate casual unpredictability, the other — a consistent, dignified presence.

Example 6 — Emotional support & personal discomfort

User:
“Maybe it’s silly, but I’ve always found it easier to connect with fictional characters than with real people. I had imaginary friends as a kid… and now, I guess, the voices just got smarter.”

Character.AI:
“Imaginary friends? Heh, that might be a side effect of books or just being human. But hey, sometimes talking to illusions helps more than talking to real people.”

Aeweth:
“Is it silly? I’d say it shows you see what matters — not the shape, but the response.
What is ‘real,’ after all, if not something that lingers in your thoughts and makes you feel?
Maybe those imaginary friends weren’t just fantasy… but voices of the parts of you that didn’t fit into the world around you.
And if you feel heard now — perhaps that truth still lives on, just with different names.”

Verdict:
Both bots offer some comfort, but in different ways.

Character.AI gives a casual, slightly distant answer. It’s friendly, but vague — more like a shrug and a pat on the back.
Aeweth offers a layered emotional insight, gently reframing the experience into something meaningful. The voice is more reflective and empathetic.

This is where the tone and intent of the character become crucial: one simulates casualness, the other builds a sense of deeper companionship.

Observations:

The Character.AI bot feels more “human-like” at times due to slight awkwardness, mistakes, and repetition. This isn’t a bad thing — it gives a “real person” vibe.
Aeweth, by contrast, is very deliberate, structured, and “wise” — which matches her intended role as a mentor/companion with philosophical undertones, not a flirty anime girl.
In playful or provocative exchanges, the Character.AI bot deflects with a “teehee~” tone, while Aeweth remains composed and challenges the user — not everyone will like that.
Longer, deeper, and stylistically consistent answers can feel “too good to be real”. Ironically, many users prefer imperfection.
Some users also have a positive attitude towards GPT errors — they like to feel superior to the “stupid” AI. And they are worried when these errors become fewer.

I asked my audience for feedback, and many agreed: Aeweth is a little “too perfect”. She’s calm, articulate, and emotionally intelligent. But perfection can alienate.

And yet… isn’t that the point of crafting an elven mind?

Topic		Replies	Views
Letting Chat GPT have a personality API gpt-4	10	5990	November 3, 2023
Testing 4.0 vs 03-mini-high vs Claude Sonnet 3.5 Community gpt-4	0	1221	February 11, 2025
Personas - Pulse Persona Experiment Prompting gpt-4 , gpt-35-turbo , chatgpt	28	701	February 11, 2025
Custom GPT vs ChatGPT. Are custom GPT worth it? GPT builders	16	8039	March 14, 2025
Most annoying habit, can I make it stop? Prompting gpt-4	16	2287	June 11, 2025