Hidden GPT 4 Turbo charge for requests?

Hello, I fail to understand pricing model for gpt-4.
Documentation says $0.03 per 1k input tokens and $0.06 per 1k output tokens.
I tested with several requests which are in total total { inputTokens: 3746, outputTokens: 1743}. The price suppose to be $0.22.
But when I look in dashboard, it says the price is $0.38. It adds GPT 4 Turbo.
So it seems like it it also charges:
(inputTokens + outputTokens) / 1000 * $0.03 (gpt-4-1106-preview output price). Which in my case is $0.16

Is there a hidden charge for turbo on top of regular gpt-4 ?

Hi and welcome to the Developer Forum!

From the playground and from code you can specify which model you wish to use, it seems that you (or someone with your API key) have made some calls with GPT-4 standard and some with GPT-4-Turbo.

I am using only gpt-4 through the openai lib, hence the confusion. How come it adds gpt-4 turbo on top. There are no places in my code where I use gpt-4 turbo

Someone is using GPT-4 Turbo, is your API key in an application that is public, have you used it in any 3rd party bring your own key applications, are there other people in your organisation, have you tried the playground?

These are all potential sources for the GPT-4-Turbo usage, there is no method that would cause GPT-4-Turbo to be used if you simply made GPT-4 API calls, have you perhaps experimented with assistants?

My guess is that you are miscounting tokens.

If you are interacting with the model in a chat-like interface you need to count the tokens for each message, each time it is sent.

So, I send you message A, you respond with message B.

I send you message C, and I include messages A and B in the context window and you respond with message D.

Then the total tokens will be,

  • Input: A + A + B + C
  • Output: B + D

Most people I’ve seen with what they think are billing irregularities are not properly counting the context every time they send it and expect the total exchange should be,

  • Input: A + C
  • Output: B + D

Leading to the observed discrepancy.

My app is local, there are people in the org but they were not using open ai services. I have sent 6 requests via the app. And I see 6 requests in activity log. There are no mentions of gpt-4 turbo in my code at all. Never used it. But dashboard says that it goes along with my gpt-4 requests.
I am using system role and pass context (previous questions/answers) to the api when I make those calls. Could it be that gpt-4 turbo is used to “understand” the context that I pass and thus the additional charge? Because I thought that for context it is considered as regular input token

I count all the tokens that I pass and receive. And my internal token calculation of tokens is equal to what I see in the dashboard. The confusion is that there are 2 models (you can see on my screenshot) for some reason, even though I only use gpt 4

Then the only explanation is that in addition to the calls you are aware of making, you or someone else used your API key to access the gpt-4-1106-preview endpoint.