Prompt tokens usage seems too high

I’m doing the simple test below calling the chat completions API. Looking at the usage information returned, the completion_tokens number perfectly matches what the Tokenizer predicts, but the prompt_tokens number seems way too high (I would expect 1 instead of 9).

Any idea why this is?

[ux-user ~]$ curl https://api.openai.com/v1/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4-1106-preview","messages":[{"role":"user","content":"Identificate"}]}'

{
  "id": "XX",
  "object": "chat.completion",
  "created": 1705814358,
  "model": "gpt-4-1106-preview",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "It seems like you might be asking for identification or clarification on a topic, but your message is quite brief and doesn't specify what you need to identify. Could you please provide more context or details so I can assist you accordingly?"
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 46,
    "total_tokens": 55
  },
  "system_fingerprint": "XX"
}
1 Like

The chat completions messages are wrapped in a container of tokens along with the name of the role, and in addition, the AI is prompted with more tokens where it is supposed to write its answer.

That gives a token overhead of 7 tokens per first message, and 4 for each additional.

For more understanding, imagine that the % symbol represents special tokens (which you can’t send yourself) that are injected into AI language. An API call that tells the AI what it is and what the user wants would have this received by the AI model internally:

%system%You are TokenBot%%user%Say Hello%%assistant%

2 Likes