With respect to API calls, there shouldn’t be any difference in the number or complexity of API calls regardless of how the system prompt and chat history are handled.
With respect to data management, I understand there may be some convenience associated with offloading the responsibility of maintaining the system prompt and context window to OpenAI, but there are a number of pitfalls with that approach.
First, that requires OpenAI to maintain chat history for the API, something which many users expressly do not want them to do for safety and security reasons.
Next, you would need a way to uniquely identify each API chat which brings with it essentially the same issue you are trying to avoid—needing to manage data. Though it may be simpler to track a chat_id
than the entire chat, doing so in either case should be a one-time problem to solve.
Lastly, if these things are maintained by OpenAI, you become limited in your ability to modify these on the fly.
With respect to the bandwidth, I completely understand how it might seem absurd to need to send a large system prompt with every invocation of the API, but consider what would be necessary on OpenAI’s side to implement what you are asking,
They would then become responsible for storing, maintaining, and securing millions or possibly billions of system prompts, for what benefit? If we include previous messages in this solution that number drastically increases. Bandwidth is incredibly cheap there’s little reason to be so concerned with it.
Let’s use your concrete example to clarify this. Let’s say we have 500kB of text, which is typically around 500,000 characters. This translates to roughly 125,000 tokens, which is about four times the context window of OpenAI’s most powerful model. If we consider the least expensive model, which charges $0.002 per 1,000 tokens, even if the model could process this much context, the cost would be about $0.25 each time just to process the system message.
On the other hand, let’s consider the cost of transmitting the data. If we use AWS as a benchmark, where the bandwidth cost is $0.09 per GB after the first 100GB/month (which is free), the cost to transmit a 500kB system message would be about $0.000045.
So, if you compare these two, you’ll find that the cost of processing a hypothetical 500kB system prompt is more than 5,500 times the cost of transmitting it.
I hope this helps clarify the matter. Let me know if you have any more questions!