Why does pricing vary by input tokens (instead of only output tokens)?

Reading over the GPT-3 paper and its references, my understanding is that the network Encoder completes only one forward pass (over the entire context window), and the Decoder completes a number of forward passes equal to the number of output tokens. Forward passes of either encoder or decoder look approximately equal (?) in computational cost.

However, OpenAI pricing charges per token the same way for both inputs and outputs. This doesn’t make sense to me, as 1) the Encoder is performing the same amount of computation no matter how many input tokens you feed it (unless something clever is done so that the non-existed padded inputs do not result in any wasted computation?), and 2) each output token is super expensive (entire forward pass of Decoder, albeit with “masked” attention) compared to an individual extra input token. Based on my current understanding, it would make more sense if the API charged only based on number of output tokens.

Clearly, I’m missing something - can anyone help me understand these mechanics better? (Or, OpenAI could be choosing this pricing model more for simplicity instead of accuracy).

1 Like

It’s Input + Output.

“Completions requests are billed based on the number of tokens sent in your prompt plus the number of tokens in the completion(s) returned by the API.”

@curt.kennedy indeed it is; my question is Why OpenAI chooses to do so, from a technical perspective.

2 Likes

@aqivnoaaqn You’ll notice requests with more input tokens take a bit longer, the attention mechanism used in these large language models tends to use a lot of memory/compute for longer input sequences, thats why theres a max size of input tokens as the current architecture of models would cause out of memory errors if they tried to train/do inference on longer sequences.

Overall your right that theres some tricks like caching the encoding that means the input doesn’t have as big an effect as the output.

For cost also checkout https://text-generator.io which bills by request and has a free 100 request a month tier so much friendlier pricing.

3 Likes