My pricing metrics make no sense? I seem to be getting charged double everyday?

@anon34024923 It looks like whoever programmed your app is allowing more and more history into the API call, which increases tokens over time. This is normally considered good because the answers are better with the additional context.

You can try getting cheaper hosting, but this is likely the main reason why your token consumption is so high. This won’t change simply by switching to Azure either.

Now, if you want, you can have the past history truncated/reduced to lower your cost. Your answers from the bot might suffer, so beware.

Hi Curt,

The app seems to work great and help people. I like how it’s working. I’d like to make how it’s working profitable and not reduce the quality. The fact that it remembers the users conversation is a good thing I want it to have memory - id just like it to be profitable

1 Like

The AI costs can get high. The only cost savings trick I am aware of within the OpenAI universe, is after 5 or so API calls, alternate between GPT-4 and GPT-3.5-Turbo. Over time, this should roughly halve your cost, with only slight quality degradation. It works because GPT-3.5-Turbo “learns” from GPT-4.

Worth a shot.

1 Like

(post deleted by author)

1 Like

I’ve created my own guru self-help bots myself. I can certainly see the value.

Good luck!

(post deleted by author)

There have been many discussions here about how to go about managing chat history more efficiently. For example, here, one technique discussed is to make embeddings of oldest history and just maintain 10-20 recent history in the context.

So basically, you are only sending 10-20 turns in every request unless the user reference something from older conversation wherein you retrieve it using embeddings technique and only then append the result to the request.

From the above, you can easily see how it can reduce your cost. For example, entire chat history is 5K+ tokens. Without history management, you send all these every time. With history management, you probably be just sending 1k+ tokens.

3 Likes

So on Aug 10 the total usage was 22.08, and there were 481 requests.

Just yesterday on Aug 20th, the total usage was 25.28 and there were 292 requests.

It seems the price per request has almost doubled in ten days,

does anyone know why this might be?

Or the size of your request is increasing (chat memory maybe?)

Are you not tracking usage on your end?

1 Like

Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words.

Model Input Output
8K context $0.03 / 1K tokens $0.06 / 1K tokens

The pricing is not per request. The pricing is based on tokens. If your costs go up, it means you have used more tokens.

Okay I think I understand, So it seems that the users are just typing more than?

okay so essentially what would be the cause? that the bot is producing more tokens or the user is? Would it be that the users are simply typing more in per request?

Both are possible, but if you’re using the same model, it’s very unlikely the bot is returning longer answers.

It’s not even users typing more. A user can respond “ok” to an unoptimized conversation of 1000 tokens, and your system could be sending 1001 tokens.

But if you don’t log it, you’ll just be guessing.

You’re not logging this on your end? How are you charging?

Are you appending old messages to new requests for Chat functionality?

I believe I am sending the chat history with each message so it gets longer each time.

But that doesn’t explain the doubling in usage rate for the same amount of requests within ten days.

I believe so, the bot is very fluid in conversation so I believe yes each time the user is sending a message its also including all the past messages that were sent in that conversation.

But that doesn’t explain why the price has doubled per request. The bot was just as fluid ten days ago as it is today, so I don’t understand how i could be raking in more token usage? Its been concatenating a + b + c + d message ten days ago just as it is today, but somehow its more now

You don’t know for sure?

Maybe people this week are running longer conversations? This would be why your token count for each “message” is higher and the cost doubled.

Are you using off the shelf code to run your service without understanding the back-end?

Without actually logging your usage, there’s no real “honest” way you can say it’s “somehow more now…” imho…

no no you’re correct, I wasn’t logging the usage, now we are ive implemented it a few hours ago.

Now im logging words sent, characters sent, what was sent, time spent on the app and the time stamp, so we can actually see now “okay maybe more freemium uses are just more engaged”

you could be totally right yeah and i may need to be patient and look at the data once i have more to see.

1 Like

But actually even then, im only tracking what messages are sent not the added text on top of each message, how would I track that?

No i know for sure. it is concatenating every message the user sends to maintain the semblance of “memory” but its really just resending the entire chat conversation each time

1 Like