I have a lot of confusion regarding how the Assitant API sets token limits. Sometimes, I get response split in 2 different message ids, which breaks the flow because I always retrieve the latest message from thread. Is there a way so I can limit GPT to respond in only 1 message(though I can use run_id as they remain same for both messages but it would cause a lot of complexity).Also, to access the latest message I have to retieve the whole thread everytime, how does this billing work, is there a way to access only the latest message, do I get billed for whole thread everytime?
nikunj
2
Sorry for the trouble here – we’ve resolved this issue and things should be in a single message now.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Assistant API response messages + token count | 5 | 2627 | December 17, 2023 | |
| Assistant Thread limitations | 5 | 1298 | July 30, 2024 | |
| Token Optimization for Assistants API - Excesive token count | 2 | 3167 | May 24, 2024 | |
| Assistant API token Usage - promt_tokens usage is too high | 8 | 2204 | April 10, 2024 | |
| Assistants API - Thread Tokens vs Thread Management | 3 | 281 | January 9, 2025 |