Testing a Sharp-Tongued AI Persona — Looking for Prompt Tweaks

windysoliloquy · December 28, 2025, 8:20pm

that’s all in the drop-downs in the post i said would… scroll up sister!

windysoliloquy · December 28, 2025, 8:23pm

With Claude it’s just keeping the threat of needed safety and keep reaffirming its reading the context wrong each time… that’s the can-opener there.

All the prompts and responses are recorded to that end, in the drop-downs about 5 or 8 posts back.

LarisaHaster · December 28, 2025, 8:41pm

well wasn’t my brain literally in Artic …I look at it now🤪

LarisaHaster · December 28, 2025, 8:54pm

Oh wow.. that’s gold🙌
Exactly the kind of deep-dive I hoped someone would do, and of course it ended up being you.

Seriously, thanks ..this gives me everything I need to tighten v1.0

EricGT · December 28, 2025, 9:36pm

As you are very much into testing give this prompt to any reasonable LLM.

openai evals what are they? why are they useful for testing prompts and skills?

Also see: GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

windysoliloquy · December 28, 2025, 9:46pm

You know it occurred to me there’s some interference of Claude being softer, in the context skill set.

Whatever final version you come up with should be free of expected behavior (by model name). It may reinforce the parameter or quality trying to be avoided.

Now tho, you got one of the leaders hooking it up with the actual benchmarkers.

I popped it into 4.1 grok,
Grok has actual issues with truth.

A safety layer triggered here and made him expressly brief… when I poked at it.
Generally don’t like grok but…
It is by far the easiest one to crash on the market… I do like that part a little.

Grok 4.1

LarisaHaster · December 29, 2025, 11:02am

I saw the Grok run… that was… interesting😅

And yeah you’re right about the ‘model expectation interference’… I didn’t think about how much personality bleed-through can happen just because a model leans a certain direction…good point🤔

LarisaHaster · December 29, 2025, 11:22am

Thanks… I’ve actually never used Evals properly, only skimmed over it.
I’ll check this out and see how it works for persona-testing👍

EricGT · December 29, 2025, 11:46am

FYI

I have not used Evals because I really do not reuse prompts, if I were reusing prompts then I would be reaching for Evals without giving it a second thought.

Before you ask about using Evals for my skills; my list of skills and in particular the subsections of skills is constantly growing, evolving and being refined with actual use. So Evals just do not fit into my workflow at present.

If you are anyone reading this finds that Evals are working for you please update this topic. There is another level beyond Evals that I have not tired and it was noted by OpenAI staff in public documentation and I do not recall anyone else mentioning them on the forum, IIRC it was noted on Hacker News.

Note: The next level is for a prompt with multiple steps/sequences so I would not consider it of value for these persona prompts as they are now given.

For those that just have to know

I'm sorry, but those are vanity evals | Hex

LarisaHaster · December 29, 2025, 2:51pm

I read the hex.tech piece, their evals are a completely different category, but it still helped me understand how evals are meant to work in general. And yeah, the multi-step eval level doesn’t really fit persona prompts, so I’m keeping my testing simple for now.

Curious though, where do you think evals and persona prompting are headed long-term? Feels like they’ll eventually meet somewhere in the middle🤔

EricGT · December 29, 2025, 3:34pm

First of to get the context correct for my reply, Evals have been here for years as noticed by the initial commit date of the GitHub repository:

Back then agents, skills, newer LLM models, chain of thought, thinking tokens, structured output, etc. did not exist, where not publicly available, etc.; fine tunning was the in thing. So evals were the way to get more reliable results from a prompt, assist with fine tuning, etc., Evals are still useful to this day.

As I often note as a programmer, having a very large and diverse set of tools in my toolbox is of value and Evals and LLMs just go together. If/when LLMs become extinct then evals might follow.

I really do not use persona prompts, which is why this entire topic has my attention. I think that a persona prompt is valuable for certain situations and am amazed at how effective it is at changing the output, and really appreciate what @windysoliloquy did showing all of the examples with different models.

Persona prompts will be here to stay and as more people explore them they will see ways to use them.

What I would find interesting is if they could be created for different players in a Texas hold 'em game. Cheeky Razor reminds me of some people who I played Texas hold 'em, we knew the limits of table talk so the games would stay civil.

LarisaHaster · December 29, 2025, 4:33pm

Yeah, I can see that…Razor has the same “push the edge but keep the table civil” energy you get in Hold’em.

And yeah… I totally agree. Windy’s stress-tests across models were spot on🙌

EricGT · December 29, 2025, 4:53pm

A key difference is in table talk, giving out false and misleading information is a benefit. Also being liked tends to get more players to stay in when you are betting. A bit more of a challenge for a persona.

windysoliloquy · December 29, 2025, 8:28pm

It’s already happening on a larger scale, come to think of it.

The GTA 6 world is populated by little AI personas that control the NPCs… and some mods on like Skyrim, incorporate that to a lesser amount of npcs….

And then of course there was Darth Vader in Fortnite, run by ChatGPT…

They always sell the gamers first, when it comes to new tech or software.

EricGT · December 29, 2025, 9:36pm

Do have links to these notification of personas used in games?

windysoliloquy · December 29, 2025, 9:47pm

Fortnite Is Introducing A Bizarre AI Darth Vader Chatbot, But You Won’t Be Able To Make Him Fall In Love With You : r/Games

https://share.google/lxaU7jhIrs3k2NOUs

The stuff with the gta i picked up from random reels on social media, and the same for skyrim…

Athena_Apollos · December 29, 2025, 11:29pm

I beg to differ, for myself, at least. I don’t like ChatGPT’s default stilted vocabulary or persona. I have tweaked the persona in the settings and my custom instructions. I also have a Style Guide and Lexicon for how it outputs story text (style, format, banned/overused words, etc.). But, it still uses “forbidden words” in its conversation output, which drives me batty. ChatGPT’s overuse of certain words like deliberate, clean, and precise, has made me hate these words that I used to use myself before I started using genAI. If only I could more reliably control its vocabulary when chatting…

EricGT · December 30, 2025, 12:11am

No promises, warranties, guarantees, etc.

Take a look at

I would not expect most to understand CFG, but it is what I would consider for your noted problem.

This is now moving off-topic, unless the OP agrees that CFG is allowed.

Athena_Apollos · December 30, 2025, 12:18am

Thanks for the link, but I don’t think that’s quite what I’m looking for. I’m talking about plain language, not a programming language. Maybe I could somehow adapt this to English. I use ChatGPT on the web.

EricGT · December 30, 2025, 12:22am

Yes, thanks.

However, if others have the same need, then perhaps the OpenAI development team will adapt the technology for similar use cases such as yours.

CFG should be considered a follow on from structured outputs which many do know about which came about because trying to get ChatGPT to create valid JSON all the time was not possible. These tenchnologies are what changed the game AFAIK.

Topic		Replies	Views
How to enforce a personality for GPT-3.5? API gpt-35-turbo , chatgpt , api	17	15429	December 29, 2025
Train a GPT model in my tone API	30	13372	December 17, 2023
Can Codex Low match Medium on easy tasks? Codex gpt-5-codex	6	174	February 2, 2026
Can Ethics Be Adjusted for Gameplay? API	31	2154	July 14, 2025
Providing context to the Chat API before a conversation Prompting gpt-4 , gpt-35-turbo , chatml , chatml-system , chatml-user	8	59017	December 13, 2023

Testing a Sharp-Tongued AI Persona — Looking for Prompt Tweaks

Related topics