I have a lot of confusion regarding how the Assitant API sets token limits. Sometimes, I get response split in 2 different message ids, which breaks the flow because I always retrieve the latest message from thread. Is there a way so I can limit GPT to respond in only 1 message(though I can use run_id as they remain same for both messages but it would cause a lot of complexity).Also, to access the latest message I have to retieve the whole thread everytime, how does this billing work, is there a way to access only the latest message, do I get billed for whole thread everytime?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Assistant API response messages + token count | 6 | 2523 | December 17, 2023 | |
Do assistants count messages in the thread against the tokens limit? | 3 | 1717 | December 17, 2023 | |
Token Optimization for Assistants API - Excesive token count | 2 | 2947 | May 24, 2024 | |
Assistant Thread limitations | 5 | 1190 | July 30, 2024 | |
Assistant API token Usage - promt_tokens usage is too high | 8 | 2005 | April 10, 2024 |