End-User Pay-As-You-Go Model

There are a lot of conversations about “pay-as-you-go” in this developer forum, but they all relate to using OpenAI specifically. But, what about us developers who create systems using the OpenAI API that we “re-sell” (I guess that’s the word) to our end-users?

Using the plethora of “Chat with your PDF” services as an example, most seem to be using gpt-3.5 and charging a set monthly amount, based on anticipated usage.

My service simply isn’t going to work (or certainly not be worth paying for) with gpt-3.5-turbo. And I am aghast and the pricing models I would have to put together to accommodate high levels of usage with gpt-4. And, of course, I can’t use the bring your own key model as it violates OpenAI’s terms of use.

So, to me, the simplest, easiest, least painful way to go is implementing pay-as-you-go for the end-user. My system can do all the necessary calculations monthly: I can hard code it, or use my shiny new Text to SQL implementation.

My question here is sort of open ended: I’m curious if anyone out there has been doing this, and what have been the issues? And, of course, just general thoughts on this as approach as well.

I mean, billing for tokens used makes perfect sense to me and you – but I’ve been working with OpenAI models closely for about 10 months now. What about the average Joe who has never head of a “token” before?

Anyway, thoughts? Anyone?


As far as I am aware the Explorer option (prepay) is for new developers to explore the API and get a feel for the options and usage, then when they have an understanding, they should move to the pay as you go model to take advantage of the increased rates, limits and features.

1 Like

But I am not talking about developers or the API. I am talking about an end-user system. A repository of legal documentation on a particular subject, in my case, CA Real Estate Law. End-users would use this to basically ask regulatory questions and get some initial feedback, with citations in the regulations. Let’s call it a Chat with CA Real Estate Law PDFs system.

Pre-pay for usage or pay-as-you-go?

Well, the standard way to do that currently is the subscription model, which would be a pre-pay.

User pays Amount_1 for Tier 1 access, 100 Queries per month, Amount_2 for Tier 2 500 per month, etc…

Then you can have Tier 1 subscribers roll over to tier 2 if they use too much, if they agree to the increased subscription.

Not to belabor the issue, but that’s my question. How likely would it be that a regular end-user would embrace the pay-as-you-go model? “I’m going to just charge you X, which is way cheaper than Y per month – you just pay for the tokens you use.”

In my experience, the general public prefer fixed spend. Those who are more involved with costs at scale understand that pay as you go is how industry works, but it is becoming, or… has become unpopular. Phone plans now offer a fixed fee for a set package, internet is a fixed fee, you pay more for upgraded phone plans or internet.

You may find offering a pay as you go per token offering and a fixed sub model is the way to go, see which gets the most customers, from a business standpoint fixed monthly residuals are the gold standard as you have a known revenue stream amount.


API Token consumption is many orders of magnitude more expensive than “Internet Data”, so really it needs a different billing model imo. If it costed ISPs $1 per MB of data to transfer data their billing wouldn’t be so “fixed rate”.

I think @SomebodySysop has a good point with the “Pay for only the tokens you consume” billing for at least the OpenAI access, and that’s how I built my platform to work. (not a commercial system, just a side project).

Many people are likening LLMs to “The New Electricity”. And so I foresee a future, unless things come WAY down in price, where every person in the world is well aware of how many API tokens they’re using, just like they might know how much Electricity they’re using. You don’t pay your Electric company a flat fee. You pay them whatever you consumed.

EDIT: In a perfect world people would perhaps buy their tokens in bulk directly from OpenAI, and then “spend them” in any app they want. Digital Signatures could be used to keep it secure. I know that’s an “out there” idea, but if prices don’t come down it’s what will emerge. Perhaps I just described a “Utility Token” (i.e. crypto currencyish). oh no! :slight_smile:

1 Like

Historically, the cost of one “unit of computation” decreases by about an order of magnitude every 3–4 years.

That’s ~1,000-fold over the span of a decade.

Even if nothing else changes this means in 2033 running gpt-4 would cost about 1/30 what it costs to run gpt-3.5-turbo today.

That said, things inevitably will change.

First, some time in the next decade some grad student (or AI) somewhere will have a genius “what-if” moment that pays off huge (or more likely a steady series of smaller achievements which culminate in something revolutionary). Think transformers or diffusion models for image generation—an absolutely game-changing idea.

This idea will make models much smaller and cheaper to run, much more powerful at the same size, or—if we’re lucky—smaller and more powerful.

In any event this idea will make things possible which never before would have been. Just as no amount of computing power could make GPT-4 as good as it is without the transformer and all the servers in the world couldn’t do what Midjourney does without diffusion, the next big idea will unleash capabilities we cannot even fathom right now.

But, even if that big idea doesn’t come and even if the cost of compute doesn’t decrease at all, there’s one more prong at our disposal: data. The amount of data in the world is increasing exponentially and the ecosystems around processing that data are growing just as fast.

We’ve already seen smaller models bootstraped to near-equal performance from their larger kin. We’ve also seen very impressive models built in to of smaller quantities of high-quality data. As the world produces more and better quality data the models trained on that data will outperform models trained on lesser and lower-quality data.

Now, you take all three of those together and think about what the landscape for generative AI is really going to look like in 2033.

Quibbling over how we pay for access to the models today is going to look pretty silly looking back.

Not that there aren’t real and valid concerns people have here in 2023—there absolutely are—I just think in ten years intelligent agents will have permeated every facet of modern life, and will be so advanced and so inexpensive they will become so ubiquitous we won’t think much about them other than they’re just there.

Just like no one thinks about the cost of making a domestic long-distance phone call anymore.


I think all of us right now are thankful that OpenAI exists, but the future should really be one where the LLMs are open source.

A couple of years ago even the word “open” in “OpenAI” had meant that. So I’m sure hoping Llama 2 and other open source endeavors will do to BigAI what the Fediverse did to BigSocial (Twitter, FB) and create a very viable free alternative in the genuinely “Open AI Space” (Three words not two)

All that being said, another challenge is just sheer memory. We may need to have these overlords in the sky all the way until it’s common for a phone or laptop to have 128GB of ram. Most laptops nowadays are still down in the 8GB to 16GB range, and so these for-profit BigAI cloud providers will have a vast marketplace even if for no other reason than that, for at least 5 years or maybe more.

1 Like

Things ARE going to change, but we’ve still got to deal with the present today. There have been a plethora of new foundational models introduced in just the past couple months, each one pronounced to be more powerful than the previous one.

But, we’ll all still be using gpt-4 or claude-2 for the foreseeable future. Why? Because, and this is just my opinion. in the world of business, all these new models are turning out to be little more than really super-smart toys. I mean, if you tell me there are a bunch of “Chat with your PDF” services out there running Llama or Vicuna or Alpaca or any of the recent horde of “run on your own hardware” models out there then I will concede my error.

As much as we complain about gpt-4, in my experience, there is nothing better, at least no publicly available API. Yes, we will get smaller, more powerful, less expensive models. But I’m waiting for the better models. Heck, gpt-4 is the gold standard, and look how it has “allegedly” declined over just the past few months.

As for “pay as you go”, the problem with being on the cutting edge is you have to suffer a lot of cuts. “Paying for tokens” in the general marketplace in 2023 is going to be a hard sell. But, somebody’s gonna have to do it.

As I am not really super-excited about billing folks at the end of the month for tokens used, I’m thinking of following the cell-phone provider model of charging x dollars each month for y totals. No rollovers, no overages.

I can calculate down to the penny how much is owed by whom at any moment in time.

So, user pays up front for the tokens he plans on using. Run out in the middle of the month? OK, buy more. Don’t use all your tokens at end of month? Still not sure what to do there. Either say “Too bad.”, or extend credit for unused tokens. Still mulling that over.

Some prepaid phone providers have the following system in place:

  • Users have pay a certain fixed activation fee upfront, however this amount will be converted into tokens that go towards their initial credit balance (expressed as the remaining number of tokens).

  • After the initial activation, the users are able to top-up their own credit balance at any time. It is possible to buy different tiers of prepaid credit, each available for a fixed fee ($1, $5, $10, whatever, …). Of course, the price of each tier corresponds to an equivalent number of tokens that will be added to their credit balance.

  • If users are low on credits, they can either manually top-up with a one-time payment or they can agree to be automatically charged after their credit reaches below a certain threshold.

  • All purchased token credits are only valid for a fixed period (e.g. 6 months) after the purchase date All unused token credits will expire after this time. Of course, older credits will be consumed before any newer credits are consumed.

  • You could differentiate your product by limiting some features to users who have bought more expensive tiers. Users on the prepaid plan however will need to top-up that minimum amount at least once every month (meaning: they need to top-up again within a time period of maximum 31 days following the previous top-up) to keep these functionalities unlocked.


I like this. Makes perfect sense as a model for “tokens” when nobody knows what a token is. Credits everybody can understand. Thank you!