i am trying to implement an algorithm for efficient user-api requests strategy that would maximize throughput without hitting OpenAI’s rate limits. As some of my queries take quite some time and i have multiple people interacting with a service.
Now i couldn’t find this information anywhere in the docs or this forum. Does somebody know whether are “used tokens” used at the start, when i do the request to the api, or at the end, when i receive the response from OpenAI?
According to my understanding, for requests that require completion, tokens can only be counted once they have been generated, so it would make sense to count them in the end. However, it can also be a two-step process with prompt tokens being counted in the beginning and completion tokens being counted in the end.