Feature Request: Add Token Usage Transparency and Real-Time Release in Batch API

bonfiglio.sergio · June 3, 2025, 4:26pm

Dear OpenAI Support,

I would like to highlight a significant issue in the management of batch jobs via API that directly affects workflow stability and user trust.

Currently, when using POST /v1/batches, a job may be rejected due to exceeding the 90,000 enqueued token limit (total input + max output tokens).
However:

The error message does not provide the actual estimated token count (neither total nor per line).
There is no pre-validation tool or endpoint to help estimate token usage beforehand.
The token validation logic seems opaque and cannot be replicated by the user.

This results in:

Workflow disruptions that cannot be debugged easily.
Developers being forced to apply overly conservative estimates (e.g., characters / 3) just to avoid rejection.
Suboptimal batch construction, with significant underuse of allowed token capacity.

#### A concrete case

In a recent batch:

My input file had 4,973 characters.
Using a conservative formula, I estimated ~1,657 tokens.
But the real token count in prompt_tokens was only 879.
That’s 5.65 characters per token, far from the expected 3:1 ratio.

This discrepancy shows that the current lack of feedback leads to substantial inefficiency.

### Suggested improvements

I respectfully request the following improvements to the batch API:

Add a visible estimated token count during batch submission:

Either globally for the batch
Or per job line

In case of rejection, return:

Estimated total enqueued tokens
Line-by-line token estimates
A breakdown of which lines caused the overage

Provide an API endpoint to estimate token usage (/v1/token-estimate or similar), usable outside batch mode.
Once a batch is completed or fails, expose a live value (e.g., currently_enqueued_tokens) to let users know:

How many enqueued tokens are still “reserved”
When those tokens will be released

### Additional concern: Token release delay is opaque

It is clear that the system keeps track of how many input tokens are currently “enqueued”, since it blocks new batch submissions based on this invisible quota.
The error message even includes an internal “customer code”, suggesting this value is tracked at account level.

However, after a batch completes (i.e. reaches status = completed), there is no way to know when the previously enqueued tokens are released.

It seems that a background process (perhaps scheduled) eventually clears this quota — but the timing is unknown and undocumented. This adds further unpredictability to batch scheduling.

I suggest that the API should:

Subtract enqueued tokens as soon as a batch completes
Expose the remaining quota via API in real time
Or at least provide a reliable release timing

### Final thoughts

As a developer building an industrial-scale application on top of the OpenAI API, I believe reliability and visibility are essential.
The current behavior — rejecting a batch for token reasons without showing the numbers or letting me know when I can safely retry — is not acceptable in a production environment.

I kindly ask the OpenAI team to consider this issue seriously, and improve the transparency and predictability of batch token management.

Thank you for your attention and the great work you do.
Sergio Bonfiglio

_j · June 3, 2025, 6:04pm

Here is the fault - seen in your quote.

You can use the tiktoken library to exactly measure language tokens yourself.

You will then be able to measure the count of encoded language tokens in combined messages, adding the overhead of 4 tokens per message and 3 per call, generally.

Then more fault is: that an estimator is used by OpenAI also as a rate limiter, not a true token encoder, and it can be off by 20% in typical use. This is where your improvement suggestion can work, to find and show you the result of inaccurate calculations, such as seen in the x-ratelimit headers.

bonfiglio.sergio · June 4, 2025, 9:43am

Sorry, but your reply doesen’t make any sense.
First, you presume we’re using Python. This is not. we’re using Delphi and we do not want to make any hybrid solution based on two languages (the death of portability and readability of the code).
Second I don’t understand why I should reinvent the hot water. If there is a computation already done to “limit” the enqueued tokens, why not transfer this data to the user? This is a non-transparent behavior, nor professional.
Third It is ridiculous to calculate the tokens with a methodology that is different from the one on the server-side, because it does not solve the problem.
The one server-side is always the winner, and we’re “on the road again” because our calculation may collide again with the server’s calculation, and the workflow is broken again.
This problem must be solved in the way I described in my post, that is to expose the data in a transparent, complete way.
This is professional. The rest is just talk.

Topic		Replies	Views
The "enqueued tokens" bug is still active Bugs gpt-4	0	52	June 20, 2025
[Solved] - Update Request—Please add used tokens to Assistants API API chatgpt , api , feedback , assistants	6	1509	January 30, 2024
Proposal: Introducing an API Endpoint for Token Count and Cost Estimation Feedback api	4	1425	September 22, 2024
Undocumented batch behavior: output_file_id not returned at creation, must query /v1/batches/{id} to detect errors Feedback openai-documentation , batch-api	4	106	May 21, 2025
Feature request: Query token counts via API Prompting	3	1635	May 24, 2022

Feature Request: Add Token Usage Transparency and Real-Time Release in Batch API

Related topics