Stateful Responses API Much Slower Than Chat Completions

Hey all! Steve here from the OpenAI API Eng team. We hear you on the latency issues when using previous_response_id. We are working to optimize our database to make this as fast as possible. For the fastest possible latency, we’d recommend using store: false. In this mode, you’d roundtrip all items (like chat completions), and we skip the database so there is no latency hit looking up a previous response.

4 Likes