@anon34024923 It looks like whoever programmed your app is allowing more and more history into the API call, which increases tokens over time. This is normally considered good because the answers are better with the additional context.
You can try getting cheaper hosting, but this is likely the main reason why your token consumption is so high. This won’t change simply by switching to Azure either.
Now, if you want, you can have the past history truncated/reduced to lower your cost. Your answers from the bot might suffer, so beware.
The app seems to work great and help people. I like how it’s working. I’d like to make how it’s working profitable and not reduce the quality. The fact that it remembers the users conversation is a good thing I want it to have memory - id just like it to be profitable
The AI costs can get high. The only cost savings trick I am aware of within the OpenAI universe, is after 5 or so API calls, alternate between GPT-4 and GPT-3.5-Turbo. Over time, this should roughly halve your cost, with only slight quality degradation. It works because GPT-3.5-Turbo “learns” from GPT-4.
There have been many discussions here about how to go about managing chat history more efficiently. For example, here, one technique discussed is to make embeddings of oldest history and just maintain 10-20 recent history in the context.
So basically, you are only sending 10-20 turns in every request unless the user reference something from older conversation wherein you retrieve it using embeddings technique and only then append the result to the request.
From the above, you can easily see how it can reduce your cost. For example, entire chat history is 5K+ tokens. Without history management, you send all these every time. With history management, you probably be just sending 1k+ tokens.
okay so essentially what would be the cause? that the bot is producing more tokens or the user is? Would it be that the users are simply typing more in per request?
I believe so, the bot is very fluid in conversation so I believe yes each time the user is sending a message its also including all the past messages that were sent in that conversation.
But that doesn’t explain why the price has doubled per request. The bot was just as fluid ten days ago as it is today, so I don’t understand how i could be raking in more token usage? Its been concatenating a + b + c + d message ten days ago just as it is today, but somehow its more now
no no you’re correct, I wasn’t logging the usage, now we are ive implemented it a few hours ago.
Now im logging words sent, characters sent, what was sent, time spent on the app and the time stamp, so we can actually see now “okay maybe more freemium uses are just more engaged”
you could be totally right yeah and i may need to be patient and look at the data once i have more to see.
No i know for sure. it is concatenating every message the user sends to maintain the semblance of “memory” but its really just resending the entire chat conversation each time