Fully understand your disappointment, I’ve felt it myself—more than once—when something doesn’t work in the way I think it does (or should).
Weirdly, I consider your disappointment here a success because it means you understand the system well enough now to be able to be disappointed with it… 
So,
for the user education portion of our program?
But it’s not all bad, you just learned that assistants isn’t a magical fix-all for developing your AI agent.
The system itself is great at being very good for a very wide array of use cases, just not always fantastic at managing costs in a reliably predictable manner.
And, all is not lost!
You always have the option to replace the retrieval
tool with something better suited to your needs.
This is absolutely more work but can also bring immeasurable value to your agent. Both in terms of giving you the direct ability to manage costs by strictly controlling the number of context tokens sent to the expensive models and in terms of allowing you to experiment with and explore more advanced RAG techniques designed around your exact data and use case.
The OpenAI RAG solution is generally very good, but it’s also fairly broad in that it seems to be meant to be good enough or even pretty good for just about everyone and almost all types of data, but it’s not tailored for you and your data.
Even if you weren’t concerned about costs, implementing your own RAG solution (that is using one of the many described in the research) would be one of the first things I would suggest to anyone looking to elevate their assistant.
This would almost certainly not be a zero-cost solution, but it is probably much cheaper than the current iteration of RAG in assistants (especially if the assistant needs to retrieve data on every call). It’s also a great learning experience as you prove exactly what and how much the models need in context to generate great answers.
I also think it will give you a great deal of understanding into your data and how is actually being used which may lead you to insights for ways you can reduce the risk amount of external data you need to have available. E.g. do I need to include this whole 6-page document or can an executive summary encapsulate enough critical data that the model can almost always infer the rest of anything important?
So, yes, I understand it’s terribly disappointing and no doubt frustrating that it’s not the perfect (and inexpensive) turnkey solution you were hoping for. But, if you’re anything like me or many of the others here and you derive great joy and satisfaction from learning and mastering something new, I hope in a short time you’ll see this as a tremendous opportunity to level-up yourself and your product.
Again, I’m sorry there isn’t an easy fix, some setting you just missed, that will bring retrieval costs back down to earth.
I’m excited to see what you do though, and I wish you all the luck in the world!