Retrieve only Assistant's response from the last run

I’m new to programming and to APIs so don’t judge too hard if this is obvious.

If I understand correctly, the only way to retrieve what the Assistant responded to the last user message is to call openai.beta.threads.messages.list() This returns ALL messages in the thread and then filtering for latest message in the API response will give the last message generated by the Assistant.

Does that mean that as the conversation with the assistant progresses, the response received after calling openai.beta.threads.messages.list() will get larger and larger and the cost per API call will grow at a pretty high rate.

Is there a way to retrieve just the message generated by the assistant during the latest run, instead of getting the full thread history and filtering out all but latest response?

Also, how will I know if the Assistant generated more than one message response in the last run? Filtering by only latest message will lose the second last one. I guess I could keep track of the time stamps of when the previous run was completed and filter by those messages that were generated after that, but that seems like a very roundabout way of doing things.

Thanks in advance

1 Like

Yes, and it’s a pretty big issue right now. The good news is that the developers are aware of this and are looking into solutions.

Gather the messages until the user message is hit, or just assume and pop it off by length. You could also view the run steps

thanks!
seems like the most obvious solution would be to be able to include the run id parameter in the .list() call so that it only returns what was generated during a specific run.

Do you know if completions API has a similar issue?

No. Mainly because the Completions API has one job. It returns the tokens that the LLM has generated.

You need to perform all the typical maintenance like context management yourself. If you use a model with a 4k context window and send it ~3990 tokens you will only get ~1 token back. If you try to send over 4k tokens you will get only an error back.

The easiest way to do it i’ve observed is to get the list of messages in the thread before you make the new run (and effectively the new messages), log the messageid of the last message, and then use that message id as the before parameter in the next lists call.

Something like this

 // Retrieve the last message before the run

const lastMessageBeforeRun = await client.beta.threads.messages.list(threadid, {
order: ‘desc’,
limit: 1
});
const lastMessageIdBeforeRun = lastMessageBeforeRun.data[0].id;
console.log(“Last message before run:”, lastMessageIdBeforeRun);
console.log(“Last message before run:”, lastMessageBeforeRun.data[0].content[0].text.value);

// … (rest of your code, including run completion)

// Retrieve messages after run completion
const messages = await client.beta.threads.messages.list(threadid, {

order: ‘desc’,
before: lastMessageIdBeforeRun
});

2 Likes