If my first message includes a large chunk of text, and I proceed to ask multiple questions about it (each question as a new message in the thread), will I be charged for all the tokens in the first message for every new message I send?
Secondly, would I eventually hit the token limit because all my messages would add up to eventually hit that limit?
Welcome to the forum.
There’s no “state” to the API, so you’ll need to send the “chunk” with every request if you want it to be considered for the output.
Yes, you’ll eventually hit the limit, so you’ll want a way to trim previous messages (or summarize them)… If you look on github, you should find a lot of examples of how this is done.