My pricing metrics make no sense? I seem to be getting charged double everyday?

There have been many discussions here about how to go about managing chat history more efficiently. For example, here, one technique discussed is to make embeddings of oldest history and just maintain 10-20 recent history in the context.

So basically, you are only sending 10-20 turns in every request unless the user reference something from older conversation wherein you retrieve it using embeddings technique and only then append the result to the request.

From the above, you can easily see how it can reduce your cost. For example, entire chat history is 5K+ tokens. Without history management, you send all these every time. With history management, you probably be just sending 1k+ tokens.

4 Likes