I want to limit the input tokens of assistant, because in the new model gpt-4-1106-preview input could be up to 120k tokens which means if my message history grows to 120k tokens I would pay $1.2 per message…
As of now, you cannot set the token limit for an Assistant beyond choosing a model with a lower context window. We have received a lot of feedback that this would be really useful so the team is looking into it.
Today, the Assistant will try to keep as many messages in context as it can and naively drop old messages as it runs out of context.
That’s fine… I can indeed limit the number of messages I’m sending to the model, but would be nice if you can share best practices for such a solution… For example does it matter if oldest message in history is assistant’s message or user… Or any recommendations would be much appreciated!
Thanks in advance!
Thanks for your clarification , speaking of today’s date is there still no way implemented to limit the tokens ?