Reducing costs from the previous context and system instructions when using chat completions api

zavocc306 · August 6, 2024, 6:27am

Hi, I’m creating a chatbot basically having like extensive 2000+ token system instructions, and I found out that the context sends over and over separately, sending previous context + system instructions + new conversation, which counts towards as it gets billed for each subsequent call.

let’s say if it charged me $0.15 before I would have charged $0.16 by now when I send new message, exponentially which is $0.31, and depletes my credits quickly as context builds up…

Is it possible to like to send new message without having to incur additional token charges from previous context and system instructions? Especially maintaining like system prompt’s tone and such throughout the conversation, or like optimize costs. Basically I just want to build my own chatbot using OpenAI API

supershaneski · August 6, 2024, 8:20am

if you are using chat completions api, there is no way to do that. you need to send the system prompt + context + new message every time. in this case, the only thing you can control are reducing your system prompt and context. for context, maybe up to 10-20 turns might be enough. as for the system prompt, maybe there are things there that can be moved as either RAG or function calling.

dan.conescu · August 6, 2024, 11:05am

LLM is stateless, it cannot store your requests to reuse them.
When using API you are in control of what you send and only you can optimize. Best practice is to trim the chat history at a specific number or replies.
But on client side anything possible. For example the context can be dynamic, you can summarize the chat frequently or extract relevant memories and send them as context instead of the full chat. Or you may even generate dynamic system messages. When user say “ok” or “thank you” then probably the system message can miss a few things.
those are just a few ideas…

zavocc306 · October 5, 2024, 7:45am

Figured caching solved this issue massively a ton

Topic		Replies	Views
Is possible OpenAI API caching the conversation? API	4	3864	June 4, 2024
Retain past responses in memory without sending them again at every API request API gpt-4 , gpt-35-turbo , chatgpt	11	10863	January 25, 2024
Token efficiency in context injection API gpt-4 , gpt-35-turbo	2	411	July 29, 2024
How to reduce cost of chat like API call API	3	2844	January 30, 2024
How can I reduce API costs with repeated prompts? API api , assistants-api	10	358	May 7, 2025

Reducing costs from the previous context and system instructions when using chat completions api

Related topics