I was under the impression that each Message in thread is given its own Context Length. Am I wrong? So all messages in a thread will consume tokens from a single Context?
You are correct.
Every question to an AI model that acts as a chatbot with memory must be accompanied by some past chat so the AI can understand what you were talking about.
“Thread” sounds like you are using “assistants” (which I would recommend against), which has no limit of the conversation length, and will fill the AI context window length with as much past conversation as will fit, and no way to budget it or even start again with a shorter conversation.
Why would you recommend against assistants API? Are there any pros to not using it?
The assistants feature is new, with the main feature that you cannot control or even see how much you are being billed for use. There are too many cons to using it.
Instead, one would just use the traditional chat completion API, which is actually straightforward, responsive, configurable, accountable.