Hi @nir.01,
as @paul.armstrong mentioned this is the case.
There are some strategies you could deploy to help you on this, for example: OpenAI API: chat completion pruning methods this is a great way to reduce tokens. Or to limit the resubmitted messages to the last 5 ones in your request.