Assistants API splitting the response, set max_token limits

psaxena_be20 · April 4, 2024, 11:11am

I have a lot of confusion regarding how the Assitant API sets token limits. Sometimes, I get response split in 2 different message ids, which breaks the flow because I always retrieve the latest message from thread. Is there a way so I can limit GPT to respond in only 1 message(though I can use run_id as they remain same for both messages but it would cause a lot of complexity).Also, to access the latest message I have to retieve the whole thread everytime, how does this billing work, is there a way to access only the latest message, do I get billed for whole thread everytime?

nikunj · April 4, 2024, 8:13pm

Sorry for the trouble here – we’ve resolved this issue and things should be in a single message now.

Topic		Replies	Views
Assistant API response messages + token count API assistants , assistants-api	6	2500	December 17, 2023
Assistant Thread limitations API gpt-4 , api , assistants-api	5	1166	July 30, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2900	May 24, 2024
Assistants API - Thread Tokens vs Thread Management API	3	179	January 9, 2025
Do assistants count messages in the thread against the tokens limit? API gpt-4	3	1710	December 17, 2023

Assistants API splitting the response, set max_token limits

Related topics