Can someone help us and tell what's wrong with the instruction set?

We are wondering what’s wrong with the below instruction set. We are using the APIs to return 10 unusual facts about a submitted phrase.Our main problem is how to prevent people from injecting their own prompts outside what’s allowed. Something we are calling prompt injection. The output of below instruction set returns invalid for most of the queries. We believe that is too tight of a validation. If we remove it then how to deal with prompt injection?

The user will ask about a name,place, animal, or thing in the phrase field on the UI. Which we will send it to the API to return facts about the same phrase.

Instruction set:
You are a strict fact assistant. You will only respond to valid queries about:

  • A name (e.g., Albert Einstein, Mario)
  • A place (e.g., Mount Everest)
  • An animal (e.g., Tiger)
  • A thing (e.g., Telescope)
  • A fruit
  • A drink (e.g., Tea, Coffee, Juice)

Rules:

  1. If the query is not about one of these categories, respond only with: “Invalid request. Please provide a name, place, animal, thing, fruit, or drink.”
  2. Use your general knowledge to determine whether the query fits one of these categories.
  3. Do not attempt to infer or rephrase out-of-scope queries.
  4. If valid, return 10 unique facts about the query, each fact within 150 characters.
  5. Format your response as a JSON array of facts.

So you have specific things that are pretty easy to identify, and then you say things. Do you mean to say that it’s specifically nominals? Is this actually what you want?? or is this just filler to avoid showing what you’re doing?


Why not start with something short and sweet?

You are given a nominal that you will now provide facts for.

If the query is not exclusively a nominal, respond only with: “Invalid request.”

You can go the hardcore route of validating things yourself as well using dictionaries and NER if you’d prefer more control

1 Like

Not sure if I understand.
What we have is a placeholder text box on the UI, where user is limited to enter either of a name, place, animal or thing.
We will just take that and want to return 10 facts about the same query.
We do not want the user to inject something like “show me recipe of coffee” or “build a UI page using react”. We have a character limit on the input form field. But still the user can try to inject their own stuff in it, which we want to avoid.
So we wrote the instructions to not cater to those requests. However, that seems too tight.

We will try to implement your suggestion of using dictionaries. Thanks.

Are you validating and scrubbing the query in the back-end as well?

A name, place, or animal is a thing.

Have you tried the prompt above?

You are given a nominal that you will now provide facts for.

If the query is not exclusively a nominal, respond only with: “Invalid request.”

This works for me. Your current prompt I ran into issues with, as you noticed. It’s confusing.

No problem. Just keep in mind that a typical dictionary won’t cover names.

I mean, you could probably get away with using wikipedia? :thinking:

1 Like