Retrieve only Assistant's response from the last run

kpebedko_1 · December 9, 2023, 10:58pm

I’m new to programming and to APIs so don’t judge too hard if this is obvious.

If I understand correctly, the only way to retrieve what the Assistant responded to the last user message is to call openai.beta.threads.messages.list() This returns ALL messages in the thread and then filtering for latest message in the API response will give the last message generated by the Assistant.

Does that mean that as the conversation with the assistant progresses, the response received after calling openai.beta.threads.messages.list() will get larger and larger and the cost per API call will grow at a pretty high rate.

Is there a way to retrieve just the message generated by the assistant during the latest run, instead of getting the full thread history and filtering out all but latest response?

Also, how will I know if the Assistant generated more than one message response in the last run? Filtering by only latest message will lose the second last one. I guess I could keep track of the time stamps of when the previous run was completed and filter by those messages that were generated after that, but that seems like a very roundabout way of doing things.

Thanks in advance

anon10827405 · December 9, 2023, 11:07pm

Yes, and it’s a pretty big issue right now. The good news is that the developers are aware of this and are looking into solutions.

Gather the messages until the user message is hit, or just assume and pop it off by length. You could also view the run steps

kpebedko_1 · December 9, 2023, 11:14pm

thanks!
seems like the most obvious solution would be to be able to include the run id parameter in the .list() call so that it only returns what was generated during a specific run.

Do you know if completions API has a similar issue?

anon10827405 · December 9, 2023, 11:16pm

No. Mainly because the Completions API has one job. It returns the tokens that the LLM has generated.

You need to perform all the typical maintenance like context management yourself. If you use a model with a 4k context window and send it ~3990 tokens you will only get ~1 token back. If you try to send over 4k tokens you will get only an error back.

TalhaKhan · January 23, 2024, 5:34am

The easiest way to do it i’ve observed is to get the list of messages in the thread before you make the new run (and effectively the new messages), log the messageid of the last message, and then use that message id as the before parameter in the next lists call.

Something like this

 // Retrieve the last message before the run

const lastMessageBeforeRun = await client.beta.threads.messages.list(threadid, {
order: ‘desc’,
limit: 1
});
const lastMessageIdBeforeRun = lastMessageBeforeRun.data[0].id;
console.log(“Last message before run:”, lastMessageIdBeforeRun);
console.log(“Last message before run:”, lastMessageBeforeRun.data[0].content[0].text.value);

// … (rest of your code, including run completion)

// Retrieve messages after run completion
const messages = await client.beta.threads.messages.list(threadid, {

order: ‘desc’,
before: lastMessageIdBeforeRun
});

Topic		Replies	Views
How to get most recent message within a thread API assistants-api	3	3972	December 19, 2023
Message Retrieval in Assistant Runs Feedback assistants , assistants-api	3	3144	November 15, 2023
How many runs should be created in a context of one thread? API api	8	944	April 8, 2024
Responses API: create run -> fetch result? API responses-endpoint	5	146	March 15, 2025
Assistants API - best way to get the reply to a given user message? API assistants-api	6	4164	July 10, 2024

Retrieve only Assistant's response from the last run

Related topics