Request to Chat models (gpt-3.5-turbo or gpt-4) can fail in 2 different ways:
Issue from OpenAI side - This is case where model is overloaded with other requests
Client adding a request time out - Here, for example, I as customer make a call with request time out as 10 seconds. The moment 10 seconds are passed, request is failed.
Typically you will be charged for everything, the billing system has no way to tell if the message is interrupted by something external to the system that sends the message out, network issues, lag, external factors, clearly if the model fails or errors then that is not counted, but if the model has performed the required task and the compute has been used… yes you are charged for it.
I understand what you mean though, typical usage error rates should be tested in your implementation and accounted for in costings. I know this rate will fluctuate, but you can build in a small % to cover those, say 1%.
OpenAI doesn’t yet provide costs segregated by API keys on their billing dashboard. Say I have 4 tools that call openai, and all 4 tools have their own API key, I can not know which tool incurred how much of Openai cost based on API keys.
So what I do is:
For each tool, I calculate cost based on prompt tokens + completion tokens + model in use
But in cases where model fails, I have no idea of “completion tokens”. So my cost calculation would be inaccurate if openai on their side has charged me for that failed request.
Yes, I see your issue, you can create separate organisations within your account and use that to track usage by implementing the following in your calling code:
import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
organization: "org-M7cflNCqTZcPZIOV2a9QrRUe",
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
or
import os
import openai
openai.organization = "org-M7cflNCqTZcPZIOV2a9QrRUe"
openai.api_key = os.getenv("OPENAI_API_KEY")
You can create multiple organisations within an OpenAI account, the GPT-4 API is applied to each Organisation so that would need to be requested for each.
If you get a “model is overloaded error,” which typically comes back within a second, you will not be charged.
If you get a “request timed out” error, which typically takes a long time, you will be charged. My understanding is that they send the request to the model, so it costs them money, even if you timeout. The only reason this wouldn’t be the case, would be if there is a local network malfunction that causes the timeout so the request doesn’t even get to the OpenAI gateway.
The solution to this is to make requests with much longer timeout values. so you get fewer timeout failures and actually wait for the completion instead.
(And, yes, waiting a minute for a completion isn’t great, and makes interactive experiences bad, especially in use cases where you can’t just stream the response directly back but have to wait for the full thing for whatever reason.)