I am relatively new to the API, but have been playing around a lot on the playground.
Essentially I would love to understand if it is necessary to send the prompt in every API request when using gpt-3.5-turbo (completions mode for the new instruct version).
Sending the prompt every time can take up a large number of tokens (as I have a detailed prompt and output rules), and the new instruct model are only available on the completions endpoint.
If you have a custom system prompt yes it does always need to be included in each HTTP request. The only thing it really “costs” you unnecessarily is the network bandwidth, because OpenAI will need to charge for it every time regardless because ultimately it has to be sent to the LLM each time, because LLMs don’t have “memory” yet…at least not “per conversation” memory.
You have to treat the model as if it were an educated stranger you have just met, for every call to the API. If you want to have historical context, you need to send that as part of the prompt… every time.