Introducing the Responses API

_j · March 13, 2025, 12:18am

The responses API doesn’t have threads, a single list of messages and internal calls that you can add to and run.

Instead, for server-side chat history state, it stores the request ID and its completion contents with every API call you make.

There is also an API parameter for “instructions” if you want the first system message to be separately changeable instead of being a part of the past messages sent.

To use this, (in the manner that many novices would expect API calls to somehow “remember” exactly who they are), you take the most recent response’s ID and send it back as the parameter for previous_response_id. When you do that, the past chat and responses you specify is reused, and you only need the newest user role question as a message in input.

There is still no cost management: the chat length will grow to the model maximum, where it will return an error, or you can set truncation:auto to start deleting old messages at the model’s maximum input.

Topic		Replies	Views
GPT-5 not showing up in Assistants API assistants , assistants-api , gpt-5	48	10515	January 12, 2026
Is there a future for the Assistants API? API assistants-api	14	4014	January 12, 2026
Transition from Assistants API to Responses API API assistants-api	13	3893	July 9, 2025
Assistants API is too slow! API assistants-api	27	6892	January 12, 2026
Responses API... not highly responsive (& what about assistants)? API gpt-4 , responses , responses-api	3	400	January 12, 2026

Introducing the Responses API

Related topics