File retrieval prompts are eating up a lot of budget


I uploaded a JSON file with 96,828 symbols (107,038 bytes) to the assistant and my prompts are eating up a lot of budget, roughly 0.5$ per prompt. I thought the price for the retrieval tool is 0.20$ per assistant per day, and I’m using just one assistant with a file size way below 1GB. Are tokens counted for symbols in a file too?

I’ve also run into some issues during usage, like extremely long response times and the reply text getting cut off. Could this be causing a problem, like the prompt somehow getting stuck in a loop? If so, what’s the best way to test or check if this is the issue? And if it is, how can I avoid these errors?

There’s multiple things:

  1. the 0.20 cents only apply to storage. nothing to do with retrieval
  2. what the assistant retrieves will be part of your prompt, which you will be charged for in terms of tokens.
  3. a new moderation layer has been added to all chat models (even the old ones) (haven’t tried instruct yet) that will terminate a generation with “finish_reason”:“content_filter” and leave you with a half finished response without throwing an error. I don’t know if that’s the issue you’re facing.

So yeah… :confused:

you can go here to see what constitutes a token:

from experience, there aren’t any freebies when it comes to tokens, whether it’s chat structural symbols, function calling, or hidden injected prompts. There’s no reason to assume it would be different here.

1 Like

That’s just the cost of having the files available.

Every time your assistant calls the retrieval tool it loads tokens into context that the model can use to fulfil your request.

If every time you prompt the model, the retrieval tool fills the context with 50-pages worth of text, it’s no different than if you had typed those 50-pages worth of text yourself. They’re all input tokens that are tallied and billed.

That’s one of the downsides of abdicating the responsibility of managing the context window, you have zero control over the end cost.

@elmstedt @Diet The answers are disappointing, but thank you anyway :smiley:

1 Like

Fully understand your disappointment, I’ve felt it myself—more than once—when something doesn’t work in the way I think it does (or should).[1]

Weirdly, I consider your disappointment here a success because it means you understand the system well enough now to be able to be disappointed with it… :rofl:

So, :tada: for the user education portion of our program?

But it’s not all bad, you just learned that assistants isn’t a magical fix-all for developing your AI agent.

The system itself is great at being very good for a very wide array of use cases, just not always fantastic at managing costs in a reliably predictable manner.

And, all is not lost!

You always have the option to replace the retrieval tool with something better suited to your needs.

This is absolutely more work but can also bring immeasurable value to your agent. Both in terms of giving you the direct ability to manage costs by strictly controlling the number of context tokens sent to the expensive models and in terms of allowing you to experiment with and explore more advanced RAG techniques designed around your exact data and use case.

The OpenAI RAG solution is generally very good, but it’s also fairly broad in that it seems to be meant to be good enough or even pretty good for just about everyone and almost all types of data, but it’s not tailored for you and your data.

Even if you weren’t concerned about costs, implementing your own RAG solution (that is using one of the many described in the research) would be one of the first things I would suggest to anyone looking to elevate their assistant.

This would almost certainly not be a zero-cost solution, but it is probably much cheaper than the current iteration of RAG in assistants (especially if the assistant needs to retrieve data on every call). It’s also a great learning experience as you prove exactly what and how much the models need in context to generate great answers.

I also think it will give you a great deal of understanding into your data and how is actually being used which may lead you to insights for ways you can reduce the risk amount of external data you need to have available. E.g. do I need to include this whole 6-page document or can an executive summary encapsulate enough critical data that the model can almost always infer the rest of anything important?

So, yes, I understand it’s terribly disappointing and no doubt frustrating that it’s not the perfect (and inexpensive) turnkey solution you were hoping for. But, if you’re anything like me or many of the others here and you derive great joy and satisfaction from learning and mastering something new, I hope in a short time you’ll see this as a tremendous opportunity to level-up yourself and your product.

Again, I’m sorry there isn’t an easy fix, some setting you just missed, that will bring retrieval costs back down to earth.

I’m excited to see what you do though, and I wish you all the luck in the world!

  1. Honestly, this is a daily source of disappointment for me. ↩︎