Let’s say I’m developing a chatbot. If I have this structure:
system role: 100 tokens
user role: “…ask gpt something…” - 100 tokens max

So here, when a user asks something (100 tokens), we will get charged for 200 tokens in, right? Let’s say assistant responds with 100 tokens, so we get charged for 100 tokens out. Then, the user asks a follow-up question (100 tokens). Do we get charged for the whole conversation again? or we’re charged for the new questions?

You don’t HAVE to send the prior conversation…but if you don’t, there’s no more “chat”, there’s only single input = single output.

==>How many bases are in the game of baseball?
There are three bases in the game of baseball. Each team has four players on the field at a time, and each player must run between the bases in a clockwise direction to score a point. The bases are commonly referred to as first base, second base, and third base. Home plate is also considered a base, but it is not commonly referred to in this way. ==>What is the distance between them?
To calculate the distance between two points, we can use the distance formula. The formula is √(x2-x1)2 + (y2-y1)2, where (x1, y1) and (x2, y2) are the coordinates of the two points. You will need to know the coordinates of both points in order to calculate the distance. Once you have the coordinates, you can plug them into the formula and solve for the distance. If the two points are in different units, you will need to convert them to the same unit before plugging them into the formula. Once you have the calculated distance, you will have your answer.

The answer without understanding the prior topic? Not so relevant.

So yes, sending back the prior user questions plus the AI answers gives the required understanding of the topic. That is input data that you must pay for, as it increases the computations the AI must do.

You can decide how much you want to minimize the amount of chat history sent to save money…like ChatGPT does.