I’m in the same exact boat, trying to figure out the root cause of this.
I’ve tried various parameters, etc. Using semantic search, with only top 5 or top 3 documents, and asking a basic (5-10 token question along with a 100 token system prompt) - My responses from Azure Open AI are telling me I am using 6000 prompt tokens on average (GPT4, 8K).
After doing some chunking on my data (200 token chunks), I was able to reduce prompt tokens down to 4,000.
Still, this just seems extremely high and I cannot pin point what I am doing wrong.
Is this what you are also seeing?