There is a delay related to the size of the input token size, although this is typically only a fraction of the total response time. I do not have access to the 32k model to build any sort of data based model of the time taken, but from the 4k and 8k models of 3.5 and 4 the increase seems small, on the order of a second or so for max token contexts.