It looks like all the files are being converted to tokens and sent to GPT. Due to this, all the calls will contain tokens worth all the files.
The simplest way to reduce this would be to not use the inbuilt file retrieval system but use a semantic matcher and extract the similar data yourself and feed that in the input to the GPT
Langchain would way a good way to go about this. If the files will keep changing and you might have to create embeddings frequently, would be a good solution to use langchain.
However, if the files are static, you oculd use a database like Pinecone to store them long term.
For similarity once you have embeddings, cosine similarity is the way to go