Are spaces and new lines counts as tokens

noamrajuan · June 18, 2023, 12:24pm

Hi, I’m currently working on a webapp, and I’m trying to estimate costs, my output is a JSON code, and it has a lot of spaces and new lines, my question is do I pay for all of these spaces and new lines?
Because if I am I need to find a solution for that.

Thanks

CreatiCode · June 18, 2023, 12:32pm

I think they are counted as tokens, and you can verify it here:

EricGT · June 18, 2023, 1:05pm

Many tokens have a corresponding token with a single space preceding the token.

New lines convert to a token. New lines as escaped characters are tokens. As for other ways of representing new lines you can check as needed.

The way to check is to use the OpenAI tokenization page with GPT-3 option.

Space example.

Text
hello goodbye hello

Tokens

Token Ids
[31373, 24829, 23748]

Notice that the token, token id: 31373, for hello is different than the token, token id: 23748, for hello which includes the preceding space.

New line examples

Text

Line 1
Line 2\n
Line 3\r
Line 4\r\n
Line 5

```
Line a
Line b
```

Tokens

Token Ids
[13949, 352, 198, 13949, 362, 59, 77, 198, 13949, 513, 59, 81, 198, 13949, 604, 59, 81, 59, 77, 198, 13949, 642, 198, 198, 15506, 63, 198, 13949, 257, 198, 13949, 275, 198, 15506, 63, 198]

Notice that Line 1 with the hidden new line converts to three tokens

Text	Token	Token Id
`Line`		13949
`1`		352
New line		198

So while new lines are not showing up in the textual representation of tokens, they are being created.

Topic		Replies	Views
Does function calling output charge for white space? API api , function-calling	6	1642	July 27, 2023
Question about function completion model tokenization API	3	430	July 12, 2023
Official token count differs from OpenAI tokenizer API	15	1957	January 3, 2024
How does GPT-3 cost calculation for languages other than English? API	7	4493	February 20, 2023
How Are Tokens Counted? API	4	1788	April 13, 2023

Are spaces and new lines counts as tokens

Related topics