Why are my context tokens used so quickly?

The cost associated with the autonomous nature of agents has been an ongoing concern.

Identified the day of release:

And more the same week:

There is no implementation of controls in two months following, except by the model selected and its maximum context length.

The solution to budget control is to continue using the chat completions endpoint, with which you have the inconveniencing of needing to unpack “files” into text the AI can understand, but the advantage that you can make knowledge injected (search “RAG”) be very relevant to the current user input and not rely on continued external calls to internal tools. You can also control how much chat history length is really sent.

1 Like