The tooltip says " The maximum number of tokens to generate shared between the prompt and completion. The exact limit varies by model. (One token is roughly 4 characters for standard English text)".
So I understand that I can only have a maximum of 4096 tokens for the complete prompt + the generated response. But for gpt 4o and other models the input can be much higher. So there is no way this is shared otherwise we would always get an error if the input is over 4096 and we wouldnt be able to generate a response.