I rely heavily on batching to perform some experiments.
I am currently in tier 2, but sometimes -seemingly at random- some batches fail for exceeding quota.
I wrote some code to correctly estimate input tokens of my batches and split them when required - input tokens are computed with tiktoken and include system message and JSON schema for structured output. I also deliberately stay below 50% of my quota, but I still get failures due to it.
Lots of messages on this forum just say “try later - it could be past batches still in the way”, but I made sure to check and all I have in the list of my batches was failed or cancelled (not “cancelling”) old batches.
I even tried waiting for about an hour and then launch batches that try to stay below 25% of my quota, still to no avail. I’m starting to believe there must be some bug.
Random thought: could the server side tokens estimator be thrown off by some particular encoding?
1 Like