Threads don’t have a size limit. You can pass as many Messages as you want to a Thread. The API will ensure that requests to the model fit within the maximum context window, using relevant optimization techniques such as truncation.
I’d really love it if we could customize this ‘maximum context window’ that it considers and still have the API handle the truncation for us. i.e. I want to just use a 10k context window or a 30k context window rather than always using the full 100k. Always consuming the maximum window isn’t always desired due to costs. But I know this isn’t super urgent as it’s possible to do this truncation manually.