Hi, this is in relation to the response/generation time for gpt-4. I am mostly concerned with the performance related to api calls. This has 2 parts.
Q1 :
While streaming, does the time to get the response of first generated token is related to the size of input prompt?
Q2 :
Lets say the model generates 100 tokens for 2k input tokens, will it take more time to generate the same 100 tokens for 10k input tokens?
I know that should typically be the case, but i want to know others views as well on this.