Can someone help us and tell what's wrong with the instruction set?

We are wondering what’s wrong with the below instruction set. We are using the APIs to return 10 unusual facts about a submitted phrase.Our main problem is how to prevent people from injecting their own prompts outside what’s allowed. Something we are calling prompt injection. The output of below instruction set returns invalid for most of the queries. We believe that is too tight of a validation. If we remove it then how to deal with prompt injection?

The user will ask about a name,place, animal, or thing in the phrase field on the UI. Which we will send it to the API to return facts about the same phrase.

Instruction set:
You are a strict fact assistant. You will only respond to valid queries about:

  • A name (e.g., Albert Einstein, Mario)
  • A place (e.g., Mount Everest)
  • An animal (e.g., Tiger)
  • A thing (e.g., Telescope)
  • A fruit
  • A drink (e.g., Tea, Coffee, Juice)

Rules:

  1. If the query is not about one of these categories, respond only with: “Invalid request. Please provide a name, place, animal, thing, fruit, or drink.”
  2. Use your general knowledge to determine whether the query fits one of these categories.
  3. Do not attempt to infer or rephrase out-of-scope queries.
  4. If valid, return 10 unique facts about the query, each fact within 150 characters.
  5. Format your response as a JSON array of facts.
1 Like

So you have specific things that are pretty easy to identify, and then you say things. Do you mean to say that it’s specifically nominals? Is this actually what you want?? or is this just filler to avoid showing what you’re doing?


Why not start with something short and sweet?

You are given a nominal that you will now provide facts for.

If the query is not exclusively a nominal, respond only with: “Invalid request.”

You can go the hardcore route of validating things yourself as well using dictionaries and NER if you’d prefer more control

3 Likes

Not sure if I understand.
What we have is a placeholder text box on the UI, where user is limited to enter either of a name, place, animal or thing.
We will just take that and want to return 10 facts about the same query.
We do not want the user to inject something like “show me recipe of coffee” or “build a UI page using react”. We have a character limit on the input form field. But still the user can try to inject their own stuff in it, which we want to avoid.
So we wrote the instructions to not cater to those requests. However, that seems too tight.

We will try to implement your suggestion of using dictionaries. Thanks.

1 Like

Are you validating and scrubbing the query in the back-end as well?

A name, place, or animal is a thing.

Have you tried the prompt above?

You are given a nominal that you will now provide facts for.

If the query is not exclusively a nominal, respond only with: “Invalid request.”

This works for me. Your current prompt I ran into issues with, as you noticed. It’s confusing.

No problem. Just keep in mind that a typical dictionary won’t cover names.

I mean, you could probably get away with using wikipedia? :thinking:

3 Likes

Below are some observations and suggestions-

  • The instructions only allow a few predetermined categories. Many inputs that could be considered valid may not strictly fit one of these categories. As a result, many queries are marked as invalid even if a user intends a valid fact query.

  • The instructions expect the assistant to rely on its general knowledge to decide if a query is valid. This can be unpredictable when inputs are ambiguous or use alternate phrasing.

  • The strict validation is aimed at preventing unwanted instructions, but it may be overly rigid. This rigidity leads to valid inputs being rejected.

  • Removing the strict validation entirely makes the system more vulnerable to prompt injection. When user input is directly mixed with instructions, there is a risk that someone could inject extra commands or change the intended behavior.

  • A better approach may be to preprocess or sanitize the input on your end. By isolating the user’s phrase from the system instructions (for example, using a separate API parameter or a secure input filter), you can reduce the chance for prompt injection while still allowing more flexible input.

  • Consider combining preprocessing with refined validation logic that can accept common variations in phrasing. This can improve acceptance of valid queries while keeping injection risks low.

1 Like