Hi Folk,
I’m using the openai
Python library, but my requests are being sent to the Gemini API via the compatible endpoint:
https://generativelanguage.googleapis.com/v1beta/openai/...
.
In this setup, I’m wondering:
- If a request to the LLM API fails (e.g., due to timeout, quota limits, malformed input), will it still consume tokens or incur charges?
- Are pending requests (e.g., stuck or slow responses) charged?
- What are some common failure modes I should handle (e.g., 429, 500, etc.)?
- Since I’m using the
openai
library for Gemini, are billing/logging behaviors in the same format as OpenAI’s or should I refer to Google’s billing documentation?
Any guidance or examples of best practices for error handling and understanding when costs are incurred would be greatly appreciated!
Thanks!