Force the Assistant to read the knowledge base files before generating the reply

I’m making an assistant in OpenAI platform so that it generates the emails with the attributes provided.

Examples of these attributes includes the type of the email (Lead Generator, Webinar, Product Launch) etc and the theme of the email (Contrarian, Pop Culture, Storyselling etc). The knowledge contains the 6 of the types and 10 of the themes explained (each in it’s own document).

Now the problem is whenever I am asking it to generate the email with the description of the service or the product that the email will be promoting, it completely ignores the knowledge and generates the emails in a very generic nature as if I did not provide it any instructions.

Please help me in this regard and what I can do about it.

If you need any more context, please comment and I will update you with it.

I suppose you’ve created a custom GPT and supplied it with your own instructions. Have you tried to ask something easy like “Analyze the instructions”? what would it say?

Vector Store appears to be used only as a source of external knowledge in the Assistant API. Examples of emails should be explicitly provided in a prompt. GPT probably assumes it knows enough and doesn’t need to check the Knowledge storage.

Try to explain right into the prompt that GPT must go through examples from the Assistant vector store.

It’s not a custom GPT but an Assistant at

When I asked “Analyze the Instructions” it replied with four bullet points of how it would be going about it. Apart from that nothing.

I cannot really upload the example emails into the prompt because:

  1. Firstly, they are not example emails but the files contains the explanation about the characteristics of the emails that the assistant should be spitting out
  2. As I mentioned in my original post, the there are 6 types and 10 themes. So explanation for each of them as in a single .docx file. Each file consists an average of thousand characters. So I cannot put them in the prompt instructions directly.

Got any other suggestions? :weary:

1 Like

In the assistants playground, you can set the assistant to always require a tool call before it responds to a message. I don’t know how to do that programmatically though.

It’s something I also want to implement and would appreciate a way to do it programmatically.

But wouldn’t that consume too many tokens and burn the cash? If I want to make a business off of it, it won’t be a viable solution anymore.

This is not what RAG is for. You should be using instructions to accomplish this.

If the instructions get too big or are being ignored you can move towards fine-tuning

Do you want the model to always read something?
Put it in the instructions

1 Like

Well… That’s what you wanted. Heard there are ways to optimize your RAG pipeline to prevent it from consuming too much tokens.

Even as it is, the API consumes so much tokens that it isn’t sustainable. I’m considering trying out Pinecone for my RAG.

Can you explain more on what you mean by fine-tuning?

Here’s a page by OpenAI page about it:

1 Like

Hey man, this worked flawlessly. The quality of the emails generated are unquestionable. However, I have a new problem.

For some reason the assistant is consuming 8k - 12k tokens for a single response. Is this even ideal?

For your info, the set of instructions I provided to the assistant is 45812 characters. Could this be a reason for that? Should I refine the instructions better?

Also what other things may be causing this insane amount of token consumption?

Fine-tuning worked flawlessly for you?

No not fine-tuning. Moving everything into instructions part and eliminating the vector store part.

Ohhh. Moving everything to the instructions should take a lot of tokens as it is going to be processed by the AI.

Is it just for the first response that it consumes the 8k tokens? Or for every response?

Also, do you still have anything in your vector store? Because for some reason, the assistants API consumes a lot of tokens when it calls file_search.

No I do not have anything in the vector store anymore. However the tokens consumption were way higher when I used vector store.

Moving the files into instructions part reduced the consumption however I think it is still high. And that is for each response. So this is not a viable solution as well.

Well… That is an improvement of sorts.

Seems like the AI is querying the documents in the instructions for every chat. You can try fine-tuning then. It basically creates your own custom model for you…

Yeah that’s exactly what I was thinking too. I think I need to condense the instructions.

But I don’t understand why it reads all the instructions every time.

And one more thing I have realised now is that most of the instructions become redundant based on the input given to the assistant.

For example:
If the user choose the email theme to be “Storyselling”, then all the remaining theme explanations like “Contrarian, common mistake, case study” become redundant.

So somehow I need to tell the assistant to read the instructions based on the user’s input. Did you get what I mean?

The set of all instructions is ~45k characters/~10k tokens. Which is what is causing your token spike.

As you pointed out, you need to dynamically construct your instruction through function calling. You should be abl to reduce your token count by an order of magnitude or more.