Reading over the GPT-3 paper and its references, my understanding is that the network Encoder completes only one forward pass (over the entire context window), and the Decoder completes a number of forward passes equal to the number of output tokens. Forward passes of either encoder or decoder look approximately equal (?) in computational cost.
However, OpenAI pricing charges per token the same way for both inputs and outputs. This doesn’t make sense to me, as 1) the Encoder is performing the same amount of computation no matter how many input tokens you feed it (unless something clever is done so that the non-existed padded inputs do not result in any wasted computation?), and 2) each output token is super expensive (entire forward pass of Decoder, albeit with “masked” attention) compared to an individual extra input token. Based on my current understanding, it would make more sense if the API charged only based on number of output tokens.
Clearly, I’m missing something - can anyone help me understand these mechanics better? (Or, OpenAI could be choosing this pricing model more for simplicity instead of accuracy).