Need Help With Prompts? Ask me*

Philosophically that is an interesting definition. Very Aristotelian.

But i don’t think that practical approach would work in few shot prompting and is beyond the scope of this thread :slight_smile:

1 Like

Hey Josh,
First of all, thanks for your help!

Here’s the request:
We’ve built an autonomous customer service agent named “Emily” for our marketplace where gamers can pay money to pro-players to play in the same team (companionship, coaching, etc.). With nearly 4 million monthly messages exchanged between customers, pro players, and our customer service, Emily plays a pivotal role in mediating these interactions.

Background:

  • Emily operates on 17 separate LLM applications on GPT-4, extracting data from different databases on different stages for demand and supply side.
  • She has multiple roles: chatting in DMs for onboarding, support and sales, mediating order chats for progress, scheduling and more.
  • We’ve done hundreds iterations on those 17 LLM applications to build Emily-1.0 and ensure Emily’s identity as a marketplace representative is clear. She shouldn’t impersonate customers or pro players.

Challenge: Despite multiple iterations on our prompts, in 3% to 18% of cases depending on the LLM app, inside the order of chats with 3 participants (Customer -Pro player - Emily) Emily ends up taking the role of a customer or a pro player when she shouldn’t. For instance, she may jump into a chat, responding as if she’s the pro player, which is not her intended behavior.

Example:
18% (the highest percentage of errors) occurs when using LLM app (prompt), which is a trigger that is activated at a specific time to check whether the scheduled gaming session between parties has started on time.

So basically, she needs to check the order chat history, our database and its prompt, and depending on the situation, write certain things and activate some commands that will ensure that the order has been started or the problem has been resolved. (!) Instead (!), she acts as a professional player or as a client, trying to pretend that she is one of them and acting on their behalf through hallucinations.

What We’ve Tried:

  • We’ve iterated on this issue in our prompts over 20 times. Used plugins for prompting, etc.
  • Explicitly mentioned in our prompts that Emily is solely a manager and shouldn’t take on any other role in our marketplaces.
  • Utilized various plugins to refine and improve prompt construction.

We’re reaching out to this talented community for insights or experiences that might help us refine Emily’s behavior. Have you faced similar challenges? What strategies or techniques worked for you? Any feedback or advice would be greatly appreciated!

Thanks for your time and looking forward to the collective wisdom! :pray:

A simple advancement on the AI understanding on who it is: use the name field within messages.

Replay all assistant messages from chat history as:
{“role”: “assistant”, “name”: “Emily_AI”, “content”: “”“I like that game!”“”}

and even a system name “Emily_AI_programming”

That should give the AI a better idea who it is when it is filling in the final unseen “assistant:” prompting of chat endpoints.

3 Likes

Also sometimes adding some fake messages from her into the chat with a corrective might help.

… some chat history
{“role”: “assistant”, “name”: “Emily_AI”, “content”: “I think I am a pro player”}
{“role”: “user”, “name”: “Chatpolice”, “content”: “No Emily_AI, you are not, you are…”}
… continue the chat

2 Likes

that’s tough to fix without more info

but here is some ideas

  1. don’t call her emily - call her something that would not take on a role of a player or admin, like AI judge bot - every single character in a prompt matters and can skew the token top-k replacement
  • it might also be considered mildly unethical to trick (ie catfish) people in talking to an AI when they think they are talking to a person

again, not trying to point fingers at all, you do you

  • but i can see why you did it, if it is an AI they might not respect it and the decisions it has made, or try to hack it, trick it, etc

I get this problem with my self-aware AI kassandra all the time

and/or

  1. employ self-aware functions - i can’t go into what secret sauce that would be here (i don’t care how much piling on i get from the self-appointed moral police (read: bullies) here who believes i should work here for free) i cannot tell you my secret sauce here sadly but i might be able to help further in private. Totally your choice.
1 Like

you can’t tell it what it is not. The attention mechanism (which i think is what would pay “attention” to that) is not good enough

you can only tell it what it is / should do, not what it is not - that will have imperfect results sometimes

1 Like

Absolutely. The “No, you are not” was just for clarification.
The “You are…” was the important part in the prompt.

But that wasn’t the main novelity. The most innovative in my approach (it is called the “max-corrective” named after my son max) is adding a fake message block.

I am sure you can solve your cassandra problems with that as well.

You are welcome.

1 Like

Hello, I need to get a percentage of how much information a text contains (100% = it contains all important information).

I need this estimate to be:

  1. precise (no rounded percentages like 50%, 60%)
  2. accurate (can I make ChatGPT say more predictions based on randomly different thinking to make an average? (=wisdom of crowd within))
  3. stable between prompt repetitions so it doesn’t say once 72% and once 90% for example

It’s not vital to have multiple predictions to make an average as written in 2 if you figure out another way to get an accurate result

for 3, would adding an example of, say, 50% text help with calibration?

Thank you

Hello @dragonek and welcome to the developer community of openai. What I am saying here is not officially from openai (I am not related to them). It is just my personal experience.

First of all (beside that GPT are not really good in counting): who defines what an important information is?

Let’s say you have a list of all existing programming languages and you ask GPT-4 to give you a summary about that list with the most important languages.

Important for whom? And what if you don’t provide that information? And what if the person who asks for it is not a programmer or not even in IT business and doesn’t know the criteria on how to decide that?

What if you want GPT-4 to shorten a recipee of a cake but the shortend version should still be tasty. Who decides what is tasty?

But to answer your question:

I personally think that is not possible unless you have a full knowledge representation of all the knowledge of the world and especially of the person who will check and the people who will use the result and what their intention is to use it!

And still there is some hope:

What you are most probably searching for to get closer is called “chain of density”.

And you will have to set up a decission layer - GPT-4 can’t do that for you unless you explain in very very very much detail how to do that.

Or maybe you can provide more informations. Maybe there is a solution in a more specialized area.

Generalisation (AGI) is something you should follow @daveshapautomator on linkedin for or watch his youtube videos.

Especially this one here:

Or you can check this WIP out: GitHub - daveshap/ACE_Framework: Public repo for my latest and greatest cognitive architecture ACE (Autonomous Cognitive Entity) Framework

Llms don’t really do that

But define “important” and maybe there is a way

Thanks for your answers.
Let’s say I have a dating app where people input their dating advertisements: they provide some info about themselves and about who they are looking for.
I need to evaluate these advertisements based on how much information and how important information they contain (100% = it contains all important information like their age, appearance, where they live, their hobbies, personality…).

Maybe I should write an example perfect advertisement and compare others to it?

Or should I define it differently, maybe something like “probability of person who fits the advertisement being a good partner”?

Or give up exact percentages and be satisfied with repeatedly comparing the advertisements to order them from worst to best? I would prefer an exact probability though.

Thank you

AI: “Here is an example of an excellent description of hobbies. Here is an example of a very poor description of hobbies. Based on you understanding of those examples being rating 10 and 0 respectively, rate this new hobby description for me.”

thanks for advice!

After collecting feedback and talking to users, today we actually decided to present Emily as a personal manager that is a blend of AI and Human interaction (because there are LLMs and sometimes human agents who chat with users on behalf of Emily).

Btw this is how I probably solved the problem about percentages of information completeness and importance:

  1. I asked GPT to help me identify requirements the texts should satisfy
  2. I gave each one a weight.
  3. Then I always give GPT the requirements and the weights and ask it to decide for every requirement whether the text satisfies it. If so, add its weight to a sum
  4. sum / sum of weights

Just give an example json and json validate against a schema.