Assistants API splitting the response, set max_token limits

psaxena_be20 · April 4, 2024, 11:11am

I have a lot of confusion regarding how the Assitant API sets token limits. Sometimes, I get response split in 2 different message ids, which breaks the flow because I always retrieve the latest message from thread. Is there a way so I can limit GPT to respond in only 1 message(though I can use run_id as they remain same for both messages but it would cause a lot of complexity).Also, to access the latest message I have to retieve the whole thread everytime, how does this billing work, is there a way to access only the latest message, do I get billed for whole thread everytime?

Topic		Replies	Views
Assistant API response messages + token count API assistants , assistants-api	6	2523	December 17, 2023
Do assistants count messages in the thread against the tokens limit? API gpt-4	3	1717	December 17, 2023
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2947	May 24, 2024
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1190	July 30, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	2005	April 10, 2024

Assistants API splitting the response, set max_token limits

Related topics