Hi, this is in relation to the response/generation time for gpt-4. I am mostly concerned with the performance related to api calls. This has 2 parts.
While streaming, does the time to get the response of first generated token is related to the size of input prompt?
Lets say the model generates 100 tokens for 2k input tokens, will it take more time to generate the same 100 tokens for 10k input tokens?
I know that should typically be the case, but i want to know others views as well on this.