It’s both - and it’s complicated
the input tokens are estimated, and added to your max_tokens - so you can think of it as total token throughput per minute of sorts. they’re not actually using tiktoken at that level, it’s more of a flooding prevention sort of thing.