Limit knowledge retrieval to only pull from a list

I’m hoping someone has some insight here to help me out.

I’m trying to build an assistant that reads a news article that I provide it as a file. The assistants job is to categorize the article into appropriate keywords but I want those keywords to be constrained to a list of keywords that I have provided it in another file. The assistant responds with keywords that are relevant to the article but it comes up with it’s own outside of my list.

Assistant has 2 files:

  1. List of keywords.txt
  2. News article.txt
    I want it to respond with the keywords that pertain to the article but I want it to only pick from the list of keywords provided in the file.

Anyone have any tips for me here?
I know this will never be perfect but so far it isn’t even close to listening to my instructions.

I know I could improve by passing the list of keywords into the system message but I have to run this on 1000’s of articles daily and I want to keep costs down.

Thanks in advance.

Hi - are you willing to share your instructions? I could take a look to see if I can identify areas for optimization.

I have a created an assistant that also must apply categorization as input for a SQL statement. In my case, I use a function call to create a JSON with the various categories. After lots of testing what I found works is to list all the options to chose from in the function call itself in the description for the individual properties. For some of the properties I have multiple hundreds of choices… This approach has achieved the best results as the assistant will strictly only draw on the choices provided.

However, based on my own experience you may be able to get close by using files provided the instructions are specific enough.

Hi @jr.2509 Thanks so much for your offer to help and sorry for the delay in getting back to you. I can definitely share what I’m trying to do. Nothing so far has been that complex in terms of my prompting.

I’ve tried a few different ways so far:

  1. List of keywords in a text file attached to the assistant and article attached to the user message
  2. Both files attached to the assistant
  3. Iterations on the two above with different prompts

Here is on of the prompts I have been trying:
You are provided with an article as an attached file. You match keywords to the provided article. Your job is to provide a list of keywords chosen from the attached list.

You will respond with keywords only found in the list provided. Do not respond with any keywords that aren’t in the provided list.

And here is another attempt
You match keywords to a provided article. You have been provided with a json file that has a list of parent keywords which are more general keywords and each has a list of associated sub keywords. Your job is to provide a list of keywords chosen from the attached list. It doesn’t matter whether the appropriate keywords are parent keywords or child keywords, just return a list of all keywords that apply

I’ve done a fair amount of prompting with chat models but given that I’m newer to knowledge retrieval, I can’t tell whether it’s my overall setup (i.e. what file goes where) or whether it’s my prompt.

Please let me know if you need me to share an example article and my list of keywords.

Again, thanks so much for your help. It’s much appreciated.

No problem at all. Just to clarify - when you say prompt, do you refer to the assistant instructions or the prompt used in the interaction with the assistant during a thread?

Of all the things I’ve tried, that prompt was in the assistants instructions