Correct token counting when calling with JSON schema

Hi!
I can’t find any information about the correct method of token counting when I send a request with a JSON schema and use tools.
The help information doesn’t give any specific algorithm for token counting in this situation. Can anyone help? Thanks!

Tools are translated into an internal language format that is different than the JSON-like function specification that you send.

In addition, internal tools on the Responses endpoint have extensive and excessive prompting, which is written by OpenAI, could change on a dime, is non-ideal in each case, and is unpublished and undocumented, yet you are billed for this placement.

The schema of a structured output is placed in AI context similar to how you specify it as JSON, additionally with the name field before the schema added also, and some more markdown text of “container”. So that can be measured relatively accurately by using the tiktoken library for encoding text to tokens.

The schema string is serialized in the same way the Python JSON library would do it - a single line. Pydantic inputs to the SDK are non-trivial to figure out or replicate, because they add the requirements of “strict” for you with all properties required and no additional properties when converting Pydantic to schema.

Use a user image: another 250 tokens of messaging is injected about “safety”, demoting your system message and degrading the API product you were thinking of.

The easiest way to figure out what your setup will cost is to use all the tools and functions of your application, and then send the most simple request as an API call, like a developer message with the one-token text “hi” (and about 8 tokens of overhead for one message). Turn off each function or tool or disable the schema, and see the billed input drop by the particular consumption.

Using the tiktoken Python library, and sending messages for the chat models, you can just measure the individual texts after that. 7 tokens of overhead for one message, 4 tokens for each additional message.

1 Like

This is not true. This was the first thing I tried. The difference is in thousands of tokens.
This is not how it is calculated at all. The JSON scheme is serialized according to some internal, undocumented principles.
I know about TikToken, but the question is exactly how to calculate the JSON scheme.