Message Retrieval in Assistant Runs

Hello OpenAI Community,

I’m currently utilizing the OpenAI API for threading and messaging, specifically using /threads and /threads/{thread_id}/messages. My process involves creating messages, initiating runs with assistants, and then fetching the latest messages from the thread. This approach works reasonably well with /threads/runs, but becomes cumbersome when adding new messages to existing threads, as it requires a multi-step process: creating a message, running an assistant, and then retrieving the message list.

It would significantly streamline the workflow if the assistant’s response could be included directly in the run endpoint’s response. This feature would eliminate the need for additional calls to fetch the latest message, thereby enhancing efficiency and user experience.

I’m curious if others in the community have similar experiences or suggestions. Any insights on this would be greatly appreciated.


This is unlikely to happen, inference takes time, keeping open connections for anything more than the absolute minimum period possible is bad practice and open to attack vectors. Polling for completed status is a standard best practice for lengthy operations.

What would you say is the most efficient way to retrieve the generated message? Previously I had my application to check every second but felt this was way too long. I’ve reduced the check loop to every .5 seconds. I don’t want this to dramatically increase the API usage. Is there a rate limit or associated cost for calling the message list endpoint?

Polling wouldn’t increase your API cost.
You can see how I do it in this Langroid code:

I have a “wait_for_run” and an async version of it.