I’m looking for clarification on two points related to the API:
Does the system message costs the same as a user message?
Does the API remember a long system message from the first request, or should it be resent each time?
Any insight into potential cost implications would be appreciated.
as far as I know everything is counted… see the example at:
This file has been truncated.
"# How to count tokens with tiktoken\n",
"[`tiktoken`](https://github.com/openai/tiktoken/blob/main/README.md) is a fast open-source tokenizer by OpenAI.\n",
"Given a text string (e.g., `\"tiktoken is great!\"`) and an encoding (e.g., `\"cl100k_base\"`), a tokenizer can split the text string into a list of tokens (e.g., `[\"t\", \"ik\", \"token\", \" is\", \" great\", \"!\"]`).\n",
"Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token).\n",
"Encodings specify how text is converted into tokens. Different models use different encodings.\n",
System message and user messages are all part of the prompt and charged for as such. API is stateless, there’s no conversation thread or anything. System prompt (and any conversation history you want for context) must be send on each request.