Yes, max tokens are also counted and a single input denied if it comes to over the limit. You can get a rate limit without any generation just by specifying max_tokens = 5000 and n=100 (500,000 of 180,000 for 3.5-16k).
The rate limit endpoint calculation is also just a guess based on characters; it doesn’t actually tokenize the input.