How to call API for multi-round messages without being charged for history messages?

Let’s say I have the following messages:

[
    {
        "role": "system",
        "content": [
            {"type": "text", "text": SYSTEM_PROMPT},
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Please summarize and redraw the attached chart.",
            },
            {"type": "image_url", "image_url": {"url": user_image_data}},
        ],
    },
],

And I keep on appending new user prompts and gpt responses to the end of this message. Will I be charged incrementally? i.e. Being charge for all history messages as the conversation gets longer (1 sys + 1 user for the first time, 1 sys + 2 user for the second conversation, 1 sys + 3 user for the third, etc.). Is there other practise to avoid this?

While it seems a tautology to say this, the only way to be NOT charged for the history messages is to not send them to chat completion ; because chat completion is stateless.

Here’s one way in which I have dealt with messages in threads not being sent to Chat Completion.

Thank you so much for your reply! Would using Assistant or something other than ChatCompletion solve this issue? Or that would still be impossible?

Ideally, I am trying to be charged for 1 sys + 1 user for the first time, 1 new user for the second time, 1 new user for the third time, etc.

You can just create another chat for each user; unless the aim is to somehow store the results somewhere.

At any rate, this post here explains the logic and some implementation. Seeking the Best API Choice: Should I Use OpenAI's Assistant API or Chat Completion API? - #12 by icdev2dev

My aim is to generate outputs for the 2rd user prompt without being charged for the sys prompt and the 1st user prompt, and the 2rd reply should still be based on the sys prompt and the 1st user prompt. Based on your reply, it seems that this is not possible?

Right. This is NOT possible.