Hey all! Steve here from the OpenAI API Eng team. We hear you on the latency issues when using previous_response_id. We are working to optimize our database to make this as fast as possible. For the fastest possible latency, we’d recommend using store: false. In this mode, you’d roundtrip all items (like chat completions), and we skip the database so there is no latency hit looking up a previous response.
4 Likes