Are spaces and new lines counts as tokens

Hi, I’m currently working on a webapp, and I’m trying to estimate costs, my output is a JSON code, and it has a lot of spaces and new lines, my question is do I pay for all of these spaces and new lines?
Because if I am I need to find a solution for that.

Thanks

1 Like

I think they are counted as tokens, and you can verify it here:

Many tokens have a corresponding token with a single space preceding the token.

New lines convert to a token. New lines as escaped characters are tokens. As for other ways of representing new lines you can check as needed.

The way to check is to use the OpenAI tokenization page with GPT-3 option.


Space example.

Text
hello goodbye hello

Tokens
image

Token Ids
[31373, 24829, 23748]

Notice that the token, token id: 31373, for hello is different than the token, token id: 23748, for hello which includes the preceding space.


New line examples

Text

Line 1
Line 2\n
Line 3\r
Line 4\r\n
Line 5

```
Line a
Line b
```

Tokens
image

Token Ids
[13949, 352, 198, 13949, 362, 59, 77, 198, 13949, 513, 59, 81, 198, 13949, 604, 59, 81, 59, 77, 198, 13949, 642, 198, 198, 15506, 63, 198, 13949, 257, 198, 13949, 275, 198, 15506, 63, 198]

Notice that Line 1 with the hidden new line converts to three tokens

Text Token Token Id
`Line` 13949
`1` 352
New line 198

So while new lines are not showing up in the textual representation of tokens, they are being created.

1 Like