Reducing costs from the previous context and system instructions when using chat completions api

Hi, I’m creating a chatbot basically having like extensive 2000+ token system instructions, and I found out that the context sends over and over separately, sending previous context + system instructions + new conversation, which counts towards as it gets billed for each subsequent call.

let’s say if it charged me $0.15 before I would have charged $0.16 by now when I send new message, exponentially which is $0.31, and depletes my credits quickly as context builds up…

Is it possible to like to send new message without having to incur additional token charges from previous context and system instructions? Especially maintaining like system prompt’s tone and such throughout the conversation, or like optimize costs. Basically I just want to build my own chatbot using OpenAI API

if you are using chat completions api, there is no way to do that. you need to send the system prompt + context + new message every time. in this case, the only thing you can control are reducing your system prompt and context. for context, maybe up to 10-20 turns might be enough. as for the system prompt, maybe there are things there that can be moved as either RAG or function calling.

2 Likes

LLM is stateless, it cannot store your requests to reuse them.
When using API you are in control of what you send and only you can optimize. Best practice is to trim the chat history at a specific number or replies.
But on client side anything possible. For example the context can be dynamic, you can summarize the chat frequently or extract relevant memories and send them as context instead of the full chat. Or you may even generate dynamic system messages. When user say “ok” or “thank you” then probably the system message can miss a few things.
those are just a few ideas…

1 Like