Best Practice to save money on Calling Assistant API

Hi, I am building a system using Assistant API,
I use Assistants API to read pdf files and chat with the data.

I upload files to OpenAI and attach it to assistant:

 assistant = get_openai_client().beta.assistants.create(
        name="Disclosure Gpt4 1106 preview",
        tools=[{"type": "retrieval"}],

the pdf files are 50 ~ 100 pages in total, I find that it used a lot token in my API call, any suggestions to reduce the cost? thanks!

It looks like all the files are being converted to tokens and sent to GPT. Due to this, all the calls will contain tokens worth all the files.

The simplest way to reduce this would be to not use the inbuilt file retrieval system but use a semantic matcher and extract the similar data yourself and feed that in the input to the GPT

Can you share tools and technique for doing what you suggest? Do you suggest using Langchain with a vector DB or some other techniques and tools?

Langchain would way a good way to go about this. If the files will keep changing and you might have to create embeddings frequently, would be a good solution to use langchain.

However, if the files are static, you oculd use a database like Pinecone to store them long term.

For similarity once you have embeddings, cosine similarity is the way to go