Due to the large context window of GPT-4 Turbo combined with the new assistants API, a single message can end up costing a crazy amount if a thread is full of messages. For most uses, I don’t need the full context, just the last ~3 or 4 messages. Unless I missed it, there doesn’t seem to be a way to truncate the thread size through the API. Anyone have any ideas/ways they’ve found to get around this?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Can we manually control the thread lenght? | 2 | 514 | December 13, 2023 | |
Assistants API context window? | 2 | 3057 | November 26, 2023 | |
Add smarter controls to truncate Thread chat history (Assistant API, Runs API) | 0 | 742 | June 28, 2024 | |
Assistant Thread limitations | 5 | 1096 | July 30, 2024 | |
Limit the context size consumed by the model when using Threads | 0 | 683 | November 9, 2023 |