Understanding GPT-4 API pricing with respect to roles and request/response

GPT-4 8k model price:
3 cents per 1k tokens for the prompt
6 cents per 1k tokens for the completion

Is “prompt” here another word for (http)request?
And “completion” means response?

I see another interpretation: it depends on the role:

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {"role": "user", "content": "Hi, can you help me find a good restaurant nearby?"},
    {"role": "assistant", "content": "Of course! What type of cuisine are you in the mood for?"},
    {"role": "user", "content": "I'm feeling like having Italian food tonight."},
    {"role": "assistant", "content": "Great! I recommend trying out Trattoria Toscana. It's a cozy little Italian place with great pasta dishes."},
    {"role": "user", "content": "That sounds perfect! What's the address?"},
    {"role": "assistant", "content": "The address is 123 Main Street. Enjoy your meal!"}
  ]
}

Here we would concatenate all contents from the user role, and this gets then tokenized and costs 3 cents per 1k tokens. While at the same time the assistant contents are computationally different and thus cost 6 cents per 1k tokens.

Also: am I right that the JSON skeleton of my request does not count towards the token limit? Only the concatenations of the “content” values?

You’re charged for every token going in and every token coming out. With practice, you can learn how to pare down prompts to the bare minimum to get what you’re after for the best price. Alternatively, in some cases, it might be possible to craft a prompt and give it one or two examples (one-shot or two-shot) and get as good results for a lot cheaper $0.002/1000 tokens for GPT-35-turbo…

2 Likes

Okay good, so indeed “prompt” means “https request” while “completion” means “http response”. OpenAI just preferred using some different wording here.

I have already been able to develop some routines to greatly reduce the token count. For example today from 904 down to 161 tokens. Anyway, thanks for the response, I don’t want to go too offtopic here!

sorry to reopen an old question, but what about the json skeleton? I’m still confused about this: Am I charged also for the tokens in the word “model”, “messages”, “role” and “content” every time they appear? What about the tokens of “{”, “}”, “[”, “]”?

My understanding is that the json skeleton does not count towards the token count. Instead we take the concatenation of all "content"s of our “messages”, tokenize that and do the same with the engine response. Only those tokens count.
But yes, if I send ten messages with 100 tokens each (i.e. the value corresponding to “content”) then I’ll have to pay for 1k tokens. If the response is 500 tokens then those need to be paid too.

You only count the tokens within “content”.

However the role messages and the call itself does have overhead from the unseen format by which these messages are passed via AI language.

First role: 7 tokens
Additional role: 4 tokens

1 Like