Correct token counting when calling with JSON schema

Vyacheslav_Vyachesla · July 14, 2025, 6:51pm

Hi!
I can’t find any information about the correct method of token counting when I send a request with a JSON schema and use tools.
The help information doesn’t give any specific algorithm for token counting in this situation. Can anyone help? Thanks!

_j · July 14, 2025, 11:58pm

Tools are translated into an internal language format that is different than the JSON-like function specification that you send.

In addition, internal tools on the Responses endpoint have extensive and excessive prompting, which is written by OpenAI, could change on a dime, is non-ideal in each case, and is unpublished and undocumented, yet you are billed for this placement.

The schema of a structured output is placed in AI context similar to how you specify it as JSON, additionally with the name field before the schema added also, and some more markdown text of “container”. So that can be measured relatively accurately by using the tiktoken library for encoding text to tokens.

The schema string is serialized in the same way the Python JSON library would do it - a single line. Pydantic inputs to the SDK are non-trivial to figure out or replicate, because they add the requirements of “strict” for you with all properties required and no additional properties when converting Pydantic to schema.

Use a user image: another 250 tokens of messaging is injected about “safety”, demoting your system message and degrading the API product you were thinking of.

The easiest way to figure out what your setup will cost is to use all the tools and functions of your application, and then send the most simple request as an API call, like a developer message with the one-token text “hi” (and about 8 tokens of overhead for one message). Turn off each function or tool or disable the schema, and see the billed input drop by the particular consumption.

Using the tiktoken Python library, and sending messages for the chat models, you can just measure the individual texts after that. 7 tokens of overhead for one message, 4 tokens for each additional message.

Vyacheslav_Vyachesla · July 15, 2025, 1:51pm

This is not true. This was the first thing I tried. The difference is in thousands of tokens.
This is not how it is calculated at all. The JSON scheme is serialized according to some internal, undocumented principles.
I know about TikToken, but the question is exactly how to calculate the JSON scheme.

Topic		Replies	Views
Official token count differs from OpenAI tokenizer API	15	1955	January 3, 2024
[Question] How is token counted from retrieval tool? API question , api	2	1093	November 14, 2023
How are Function Calling tokens counted? API token , function-calling	3	1503	December 15, 2023
How do you get token count with tools input and tool_calls output when streaming Feedback api	4	3814	November 21, 2023
Is JSON Mode supposed to result in a higher prompt token count? API	2	1530	December 1, 2023

Correct token counting when calling with JSON schema

Related topics