Are the used tokens counted when request starts or ends?


i am trying to implement an algorithm for efficient user-api requests strategy that would maximize throughput without hitting OpenAI’s rate limits. As some of my queries take quite some time and i have multiple people interacting with a service.

Now i couldn’t find this information anywhere in the docs or this forum. Does somebody know whether are “used tokens” used at the start, when i do the request to the api, or at the end, when i receive the response from OpenAI?

According to my understanding, for requests that require completion, tokens can only be counted once they have been generated, so it would make sense to count them in the end. However, it can also be a two-step process with prompt tokens being counted in the beginning and completion tokens being counted in the end.

For embeddings, they can be counted at the start.

Thank you for the answer, i will try to use end-request for calculation and hope i don’t keep getting errors :slight_smile: