Does i get charge for failed or pending LLM API requests?

Ngyn_An · May 27, 2025, 3:34am

Hi Folk,

I’m using the openai Python library, but my requests are being sent to the Gemini API via the compatible endpoint:
https://generativelanguage.googleapis.com/v1beta/openai/....

In this setup, I’m wondering:

If a request to the LLM API fails (e.g., due to timeout, quota limits, malformed input), will it still consume tokens or incur charges?
Are pending requests (e.g., stuck or slow responses) charged?
What are some common failure modes I should handle (e.g., 429, 500, etc.)?
Since I’m using the openai library for Gemini, are billing/logging behaviors in the same format as OpenAI’s or should I refer to Google’s billing documentation?

Any guidance or examples of best practices for error handling and understanding when costs are incurred would be greatly appreciated!

Thanks!

_j · May 27, 2025, 4:30am

You’d have to go over to Google to read about billing:

You’d generally be billed for input, and the amount of generation produced by the model. If you are billed depends on if the call actually got to an AI model, or the reason for failure.

You’d have to profile the different error types. There’s a few you might want to manually retry.

A caution is that the OpenAI library itself has an internal retry mechanism. You’ll probably want to disable that so that issues are not hidden:

from openai import OpenAI

# Configure the default for all requests:
client = OpenAI(
    # default is 2
    max_retries=0,
)

Topic		Replies	Views
Cost for failed GPT requests API gpt-4	8	3431	June 8, 2023
Does OpenAI charge you for failed (timeout error) requests? API	1	1490	April 18, 2023
Does a Failed Request Eat up $$ API	2	1603	December 31, 2023
Charges per request not per image API	2	1161	December 31, 2023
How Do We Get Charged: Exceeded Maximum Token Length API	4	2282	August 29, 2023

Does i get charge for failed or pending LLM API requests?

Related topics