Does i get charge for failed or pending LLM API requests?

Hi Folk,

I’m using the openai Python library, but my requests are being sent to the Gemini API via the compatible endpoint:
https://generativelanguage.googleapis.com/v1beta/openai/....

In this setup, I’m wondering:

  1. If a request to the LLM API fails (e.g., due to timeout, quota limits, malformed input), will it still consume tokens or incur charges?
  2. Are pending requests (e.g., stuck or slow responses) charged?
  3. What are some common failure modes I should handle (e.g., 429, 500, etc.)?
  4. Since I’m using the openai library for Gemini, are billing/logging behaviors in the same format as OpenAI’s or should I refer to Google’s billing documentation?

Any guidance or examples of best practices for error handling and understanding when costs are incurred would be greatly appreciated!

Thanks!

1 Like

You’d have to go over to Google to read about billing:

You’d generally be billed for input, and the amount of generation produced by the model. If you are billed depends on if the call actually got to an AI model, or the reason for failure.

You’d have to profile the different error types. There’s a few you might want to manually retry.

A caution is that the OpenAI library itself has an internal retry mechanism. You’ll probably want to disable that so that issues are not hidden:

from openai import OpenAI

# Configure the default for all requests:
client = OpenAI(
    # default is 2
    max_retries=0,
)
2 Likes