OpenAI team, thanks for the work you’re doing

OpenAI team, thanks for the work you’re doing. I’m thrilled with today’s news.

I have a few questions about the Assistants API pricing structure that don’t seem to be documented anywhere:

  • When are we charged? Is it upon initiating a run or when adding a message to the thread?
  • How are tokens calculated? Are we charged for the entire thread on each conversation turn (i.e., run)?
  • What about token calculation for a long thread that you guys might’ve truncated on the background?
  • How does token calculation work with knowledge retrieval?
  • How can I estimate the number of tokens before each run?

The Assistants API takes on a lot of the backend work, but the pricing benefits are not clear. Developers could lose some control, which might even lead to extra costs.


When you initiate the run and request an Assistant

You can find this information in the documentation. You are charged for the complete conversation on each run.

Depends on the size of the information. It may be completely fed to the model as tokens, or it make be chunked, converted into embeddings, and then it’s only fair to calculate a MAX cost based on the size of the document. The model will iterate through the chunks until it has found a suitable answer

You can use a library like tiktoken and also try averaging your previous runs with the knowledge base. It’s much better to set a limit and then grab the calculated costs afterwards.

This is a downside to using Assistants.