Force the Assistant to read the knowledge base files before generating the reply

chaithzx · June 4, 2024, 12:28pm

I’m making an assistant in OpenAI platform so that it generates the emails with the attributes provided.

Examples of these attributes includes the type of the email (Lead Generator, Webinar, Product Launch) etc and the theme of the email (Contrarian, Pop Culture, Storyselling etc). The knowledge contains the 6 of the types and 10 of the themes explained (each in it’s own document).

Now the problem is whenever I am asking it to generate the email with the description of the service or the product that the email will be promoting, it completely ignores the knowledge and generates the emails in a very generic nature as if I did not provide it any instructions.

Please help me in this regard and what I can do about it.

If you need any more context, please comment and I will update you with it.

fabrizio.gaucci · June 4, 2024, 12:45pm

I suppose you’ve created a custom GPT and supplied it with your own instructions. Have you tried to ask something easy like “Analyze the instructions”? what would it say?

STdio · June 4, 2024, 12:51pm

Vector Store appears to be used only as a source of external knowledge in the Assistant API. Examples of emails should be explicitly provided in a prompt. GPT probably assumes it knows enough and doesn’t need to check the Knowledge storage.

Try to explain right into the prompt that GPT must go through examples from the Assistant vector store.

chaithzx · June 4, 2024, 1:19pm

It’s not a custom GPT but an Assistant at https://platform.openai.com/playground/assistants

When I asked “Analyze the Instructions” it replied with four bullet points of how it would be going about it. Apart from that nothing.

chaithzx · June 4, 2024, 1:25pm

I cannot really upload the example emails into the prompt because:

Firstly, they are not example emails but the files contains the explanation about the characteristics of the emails that the assistant should be spitting out
As I mentioned in my original post, the there are 6 types and 10 themes. So explanation for each of them as in a single .docx file. Each file consists an average of thousand characters. So I cannot put them in the prompt instructions directly.

Got any other suggestions?

Olyray · June 4, 2024, 1:28pm

In the assistants playground, you can set the assistant to always require a tool call before it responds to a message. I don’t know how to do that programmatically though.

It’s something I also want to implement and would appreciate a way to do it programmatically.

chaithzx · June 4, 2024, 1:43pm

But wouldn’t that consume too many tokens and burn the cash? If I want to make a business off of it, it won’t be a viable solution anymore.

anon10827405 · June 4, 2024, 1:50pm

This is not what RAG is for. You should be using instructions to accomplish this.

If the instructions get too big or are being ignored you can move towards fine-tuning

Do you want the model to always read something?
Put it in the instructions

Olyray · June 4, 2024, 2:03pm

Well… That’s what you wanted. Heard there are ways to optimize your RAG pipeline to prevent it from consuming too much tokens.

Even as it is, the API consumes so much tokens that it isn’t sustainable. I’m considering trying out Pinecone for my RAG.

Olyray · June 4, 2024, 2:03pm

Can you explain more on what you mean by fine-tuning?

anon10827405 · June 4, 2024, 2:09pm

Here’s a page by OpenAI page about it:

https://platform.openai.com/docs/guides/fine-tuning

chaithzx · June 5, 2024, 12:10pm

Hey man, this worked flawlessly. The quality of the emails generated are unquestionable. However, I have a new problem.

For some reason the assistant is consuming 8k - 12k tokens for a single response. Is this even ideal?

For your info, the set of instructions I provided to the assistant is 45812 characters. Could this be a reason for that? Should I refine the instructions better?

Also what other things may be causing this insane amount of token consumption?

Olyray · June 5, 2024, 12:37pm

Fine-tuning worked flawlessly for you?

chaithzx · June 5, 2024, 12:49pm

No not fine-tuning. Moving everything into instructions part and eliminating the vector store part.

Olyray · June 5, 2024, 12:52pm

Ohhh. Moving everything to the instructions should take a lot of tokens as it is going to be processed by the AI.

Is it just for the first response that it consumes the 8k tokens? Or for every response?

Also, do you still have anything in your vector store? Because for some reason, the assistants API consumes a lot of tokens when it calls file_search.

chaithzx · June 5, 2024, 1:05pm

No I do not have anything in the vector store anymore. However the tokens consumption were way higher when I used vector store.

Moving the files into instructions part reduced the consumption however I think it is still high. And that is for each response. So this is not a viable solution as well.

Olyray · June 5, 2024, 1:08pm

Well… That is an improvement of sorts.

Seems like the AI is querying the documents in the instructions for every chat. You can try fine-tuning then. It basically creates your own custom model for you…

chaithzx · June 5, 2024, 1:22pm

Yeah that’s exactly what I was thinking too. I think I need to condense the instructions.

But I don’t understand why it reads all the instructions every time.

And one more thing I have realised now is that most of the instructions become redundant based on the input given to the assistant.

For example:
If the user choose the email theme to be “Storyselling”, then all the remaining theme explanations like “Contrarian, common mistake, case study” become redundant.

So somehow I need to tell the assistant to read the instructions based on the user’s input. Did you get what I mean?

icdev2dev · June 5, 2024, 1:39pm

The set of all instructions is ~45k characters/~10k tokens. Which is what is causing your token spike.

As you pointed out, you need to dynamically construct your instruction through function calling. You should be abl to reduce your token count by an order of magnitude or more.

Olyray · June 5, 2024, 1:43pm

I get. File search is supposed to handle this well with vector embeddings and search. It’s supposed to create a vector from the user query and use it to search through the vector database for something similar and then use that for the AI’s context.

Openai doesn’t seem to optimize this well. People have said pinecone might be better at this, which is what I’m currently looking at.

Topic		Replies	Views
How do you maintain historical context in repeat API calls? API	29	93286	December 23, 2023
CLOSED Separate ChatCompletion API calls for 'system' and 'user' API	19	3643	September 20, 2023
ChatGPT - Do we get exactly 40 messages per 3 hours? GPT builders feedback	27	31739	January 5, 2024
Build your own AI assistant in 10 lines of code - Python Documentation gpt-4 , gpt-35-turbo , chat-completion , python , tutorial	13	66735	December 12, 2023
Stateles & sending previous replies, to create a thread API gpt-4	7	3436	December 19, 2023

Force the Assistant to read the knowledge base files before generating the reply

Related topics