I found a reproducible issue with the Responses API when using the hosted shell tool together with previous_response_id.
A first response can complete successfully with a hosted shell call, but a direct continuation using only previous_response_id fails with:
Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}
The surprising part is that the first response itself is already marked completed, and the assistant message includes the shell result, but the response payload often does not include a shell_call_output item.
Why This Looks Like a Bug
The API appears to require a shell_call_output item for continuation, while also sometimes not returning that item in the first response payload.
This creates an inconsistent contract:
- The hosted
shelltool executes server-side. - The first response is
completed. - The assistant can describe the shell output in natural language.
- But continuing from that response via
previous_response_idcan fail because the server says the shell output is missing.
Environment
- Model:
gpt-5.2 - Responses API
- Hosted tool:
shell - Tested with Python SDK versions:
openai 2.24.0openai 2.26.0
- Result was the same on both versions for the main repro.
Main Reproduction
Request 1
Create a response with hosted shell:
from openai import AsyncOpenAI
client = AsyncOpenAI()
resp1 = await client.responses.create(
model="gpt-5.2",
input="Use the shell tool once to run: printf first_turn. Then briefly report the output.",
tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
reasoning={"effort": "medium", "summary": "detailed"},
include=["reasoning.encrypted_content"],
background=True,
)
Poll until terminal with client.responses.retrieve(resp1.id).
Observed Response 1 Shape
In multiple runs, the first completed response looked like this structurally:
{
"status": "completed",
"output": [
{"type": "reasoning"},
{"type": "shell_call", "status": "completed", "call_id": "call_..."},
{"type": "reasoning"},
{"type": "message"}
]
}
Notably absent:
- no
shell_call_output
Even though the assistant message already described the shell result.
Request 2
Now continue directly from the first response:
resp2 = await client.responses.create(
model="gpt-5.2",
previous_response_id=resp1.id,
input="Now answer with exactly: second turn worked",
tools=[{"type": "shell", "environment": {"type": "container_auto"}}],
reasoning={"effort": "medium", "summary": "detailed"},
include=["reasoning.encrypted_content"],
background=True,
)
Actual Result
This fails immediately with:
Error code: 400 - {'error': {'message': 'No tool output found for shell call call_...', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}
Expected Result
If the hosted shell call completed server-side and the first response is already terminal, then either:
previous_response_idcontinuation should work without any extra client-side tool-output replay, or- the first response should always include the required
shell_call_outputitem so the client can replay it deterministically.
Verified Workaround
The continuation works if I manually inject a shell_call_output input item in the next request:
resp2 = await client.responses.create(
model="gpt-5.2",
previous_response_id=resp1.id,
input=[
{
"type": "shell_call_output",
"call_id": "call_from_resp1",
"status": "completed",
"output": [
{
"stdout": "first_turn",
"stderr": "",
"outcome": {"type": "exit", "exit_code": 0},
}
],
},
{
"role": "user",
"content": "Now answer with exactly: second turn worked.",
},
],
reasoning={"effort": "medium", "summary": "detailed"},
background=True,
)
This succeeds.
Additional Observation: Inconsistent shell_call_output Presence
After introducing manual shell_call_output replay in a chain, later hosted shell responses sometimes started including shell_call_output items automatically in their returned output.
So there seem to be two inconsistent behaviors:
- Some completed hosted-shell responses return only
shell_call+message. - Other completed hosted-shell responses return both
shell_callandshell_call_output.
That inconsistency makes it difficult to know whether the client is expected to replay tool output or whether the server should already be carrying it forward.
Includes Tested
I also tested all documented include values that are compatible with reasoning models:
[
"file_search_call.results",
"web_search_call.results",
"web_search_call.action.sources",
"message.input_image.image_url",
"computer_call_output.output.image_url",
"code_interpreter_call.outputs",
"reasoning.encrypted_content",
]
This did not fix the issue.
Related But Separate Issue
I am also investigating a separate 400 error in a larger workflow that mentions a missing reasoning item.
At the moment, I have not minimized that second issue to a standalone hosted-shell repro. In my local tests, once I manually replay shell_call_output, multi-turn hosted-shell chains can continue successfully and retain memory of earlier shell outputs.
So this report is specifically about the reproducible hosted shell continuation problem where:
- the first response completes,
- but continuation via
previous_response_idfails unless the client manually reconstructs and submitsshell_call_output.
Minimal Expected Contract
For hosted shell plus previous_response_id, one of these should be true consistently:
- hosted shell execution state is fully preserved server-side, so direct continuation works, or
- the API always returns the exact
shell_call_outputitem needed for replay in the next request.
Right now, neither appears reliable enough.
Local Artifacts Collected
I collected raw response payloads during testing, including:
- initial first-response payloads without
shell_call_output - all-compatible-includes payloads
- successful manual-
shell_call_outputworkaround payloads - longer manual replay chains
If useful, I can also provide raw JSON examples.