Will conversation history tokens charged for api usage or only new message considered for charging?
Of course, it will be charged. The more conversation there is, the more expensive each subsequent message will be.
To use the ChatGPT API without losing the continuity of the conversation, you will need to implement a persistent session management system on your end. Here’s how you can do it:
- When the user starts a conversation, send an API request to ChatGPT to initialize a session. Save the session ID that is returned by the API.
- Whenever the user sends a message, include the session ID along with the message text in the API request.
- When you receive a response from ChatGPT, save the session ID in a database or a cache along with the conversation history.
- If the user sends another message, use the saved session ID to resume the conversation by sending it along with the new message to the ChatGPT API.
- Repeat this process for each subsequent message in the conversation, always including the session ID with the message and saving the session ID and conversation history after each API call.
By implementing this persistent session management system, you can maintain the continuity of the conversation and ensure that the user’s context is preserved across multiple API requests.
I have created an alternative API for exactly this. You can use it to add conversation history to your GPT API calls and just use it like the GTP API with your own token.
Find it here: https://gptconverse.online/
This solution doesn’t seem very scalable. In terms of Big O, would this be considered O(n) time and space complexity? I was very surprised to find out this.
Does the actual ChatGPT website function the same way? Was the design decision here around privacy? And, lastly, is there any talk or roadmap about storing the message history on ChatGPT side?
You didn’t tell which “solution” you are referring to. You didn’t press the “reply” button on the relevant post.
This topic is quite old, starting even before the chat API was released, and has several diversions that are not answering “remember previous messages”.
Here’s a better recent thread to move to:
The conversation length cannot increase infinitely, because the AI model has a limited context length area to supply it past conversation. Management and truncation is required. The maximum compute would actually be in producing a very long output, as the input size has a smaller effect on the processing cost.
ChatGPT has very aggressive minimization of past conversation turns, giving the AI only what is needed to understand the present topic, so an otherwise ambiguous question like “what about the other one?” can be answered.
This is a very basic process that has worked extremely well for me. I did a video on it here:
Based upon how it was explained to me in this conversation: Chat Completion Architechture
Is session id look like “chatcmpl-9IxJCCZAkqJVBZyk8FO12429sVwRc”?
Where should I put it in my request at step 4? OpenAI Platform - I can’t find any fields for session ID here.
The text quoted poorly describes what you’d do, the “poorly” starting at calling the API ChatGPT…
The ID returned with a response is just a unique identifier used internally by OpenAI for each API call return. It means nothing to you. If you log it along with your messages, in olden days of actually having support, you might say “my user 35185 made the request ID 385898634, but as I also sent the user name plus that call was first submitted to the moderations endpoint with id 385828531 returned, my account shouldn’t be banned”.
Examine ChatGPT. You have a hierarchy:
- user name
- user chats
- user messages
Just have a database that records all those. Then for each new message the user would send, include the most recent messages of a chat that would fit in the token budget you have for chat history context to send to the API model.
So, is having own database and sending entire context (truncated to token limit) with each new request the only way with chat completion API? Is there no way to keep history management on OpenAI side other then assistant API threads (OpenAI Platform)?
When it is managed, it is not the maximum context length.
Regardless of whether a model is accessed by you or by assistants, you pay for the input tokens.
The difference is the assistants documentation PROMISES to use the full context with whatever they can get, either from an entire conversation stream, or any documentation up to the maximum, and then iterate calling functions to make multiple bills with that context.
I’m mostly concerned with tokens per minute limits, not the number of tokens I’ll have to pay. Will assistant’s usage also spend my TPM limits (OpenAI Platform)?
Yes, each internal call counts against TPM. In fact, it is poorly managed, wasting your max TPM on NOT getting an answer if the assistant hits the limit when iterating.
Lesson: assistants - not ready for use
Thanks for your help. Another reason not to use assistants yet. First one is lack of answer streaming API
One way to address this challenge is to append previous chat messages to the current query. This allows the LLM to access the context of the conversation and generate more relevant and consistent responses, I applied this for my agency and work fine https://seofactor.co.uk/ .