Impact of conversations on the number of tokens

Hello, everyone,
after doing some experiments in simple calls to the GPT-4 API, I implemented the use of conversations on calls.
The result was very good, but I noticed that keeping the conversation open requires that in addition to an id that identifies the user making the request, it is also necessary to send the list of previous messages, i.e. all the requests that have already been made.
Although this produces a better result, I have noticed that it has a considerable impact on the cost of the conversation, as all previous conversations are probably calculated as tokens, even though these have in fact already been ‘counted’ in the requests already made.
I tried passing only the user id, hoping that GPT-4 would recognise the active conversation this way, without having to send all the messages, but in this case the open conversation is not recognised.
Wouldn’t it be more convenient to request only the id? Also because I don’t think it is fair that the tokens of the latest requests should be added to the cost of all the requests already made.
Or am I doing something wrong?
Thank you

the user identification should serve to maintain the status, as it does for the web. Otherwise, each request weighs (in terms of tokens) as the sum of all requests and replies made previously, completely distorting the sense of the cost of the service (cost per token but of new requests, and not of the sum of all those made up to that moment)

Forgive me, but I asked a different question: I did not ask how the model works, because how it works I have seen and described.
I asked if it could not work like other stateless models, such as the web, which on each click do not send the entire page or all the pages navigated but it is sufficient that they pass the active session for them to have an adequate response.
I haven’t done a comparison between ChatGpt and GPT but I don’t think ChatGPT charges all the tokens of the whole conversation but only those of the last request and the last response.
But if this is not the case and even then all the tokens of the entire conversation are charged, then I got the wrong impression.

I understand that you took a dislike to me, but my question is different: I didn’t understand the logic why if the conversation takes place on chatgpt all the passed tokens are not loaded, but only those of the last request, whereas with gpt they are all charged.
I had already understood that you don’t have an answer to the logic of my question but since you had stated that chatgpt and gpt work in the same way, the doubt of a misunderstanding had arisen.
The real answer is: they work the same way technically but not in terms of cost, as the cost of gpt is higher, since it counts all the tokens of the entire conversation.
In any case, thank you for your contribution.

You really don’t seem to be understanding.

They are different products, of course they are charged differently.

I answered your question in the only way it can be answered.

Your original question demonstrated you don’t understand how the models work.

So, I explained that’s how the models work. There is no free lunch they have no memory.

There’s zero cost associated with sending messages back and forth, the cost is in processing the tokens.

I have no opinion about you one way or another.


From reading this conversation I’d say that @elmstedt made it very clear that there is no open conversation when using the API besides the one message being send forth and back.

This does only exist when a chatbot has been designed and developed this way. The API is only the foundational building block to achieve this behavior.

You can either mark this topic as solved or try to rephrase and I am sure you will get an answer.