Assistants API splitting the response, set max_token limits

I have a lot of confusion regarding how the Assitant API sets token limits. Sometimes, I get response split in 2 different message ids, which breaks the flow because I always retrieve the latest message from thread. Is there a way so I can limit GPT to respond in only 1 message(though I can use run_id as they remain same for both messages but it would cause a lot of complexity).Also, to access the latest message I have to retieve the whole thread everytime, how does this billing work, is there a way to access only the latest message, do I get billed for whole thread everytime?

2 Likes

Sorry for the trouble here – we’ve resolved this issue and things should be in a single message now.

1 Like