Why does pricing vary by input tokens (instead of only output tokens)?

aqivnoaaqn · October 12, 2022, 2:00pm

Reading over the GPT-3 paper and its references, my understanding is that the network Encoder completes only one forward pass (over the entire context window), and the Decoder completes a number of forward passes equal to the number of output tokens. Forward passes of either encoder or decoder look approximately equal (?) in computational cost.

However, OpenAI pricing charges per token the same way for both inputs and outputs. This doesn’t make sense to me, as 1) the Encoder is performing the same amount of computation no matter how many input tokens you feed it (unless something clever is done so that the non-existed padded inputs do not result in any wasted computation?), and 2) each output token is super expensive (entire forward pass of Decoder, albeit with “masked” attention) compared to an individual extra input token. Based on my current understanding, it would make more sense if the API charged only based on number of output tokens.

Clearly, I’m missing something - can anyone help me understand these mechanics better? (Or, OpenAI could be choosing this pricing model more for simplicity instead of accuracy).

curt.kennedy · October 19, 2022, 9:41pm

It’s Input + Output.

“Completions requests are billed based on the number of tokens sent in your prompt plus the number of tokens in the completion(s) returned by the API.”

aqivnoaaqn · October 20, 2022, 12:09pm

@curt.kennedy indeed it is; my question is Why OpenAI chooses to do so, from a technical perspective.

leepenkman · November 16, 2022, 7:25am

@aqivnoaaqn You’ll notice requests with more input tokens take a bit longer, the attention mechanism used in these large language models tends to use a lot of memory/compute for longer input sequences, thats why theres a max size of input tokens as the current architecture of models would cause out of memory errors if they tried to train/do inference on longer sequences.

Overall your right that theres some tricks like caching the encoding that means the input doesn’t have as big an effect as the output.

For cost also checkout https://text-generator.io which bills by request and has a free 100 request a month tier so much friendlier pricing.

Topic		Replies	Views
Pricing based on actual or requested output length API	1	347	March 29, 2024
Does OpenAI charge for both prompt and completion? API	3	1897	November 7, 2023
Confused about OpenAI Batch API (GPT-4o-mini) pricing – Why are the total costs higher? API batch-api	7	288	October 18, 2024
Am I begin overcharged for o1-mini? API o1-mini	5	144	September 30, 2024
Clarification on token pricing for multiple completions (n>1) in a single API call" API pricing	1	267	July 3, 2024

Why does pricing vary by input tokens (instead of only output tokens)?

Related Topics