Gpt-5 Responses API: hundreds of calls return empty completion content (while Chat works)

Hi everyone — I’m struggling to get any successful text back from the Responses API with the gpt-5 model. After hundreds of attempts, every call either returns empty completion content or errors. Oddly, switching to Chat Completions works (with some adjustments). I’d really appreciate guidance on whether I’m using the Responses API correctly for gpt-5, or if this is a known issue.

Environment

  • OS: macOS (Intel)
  • Python: 3.13 (virtualenv)
  • openai Python SDK: latest 1.x (from PyPI)
  • Network: stable, no proxy
  • Status page showed normal during tests

What I’m seeing (Responses API with gpt-5)

  • Repeated warnings in my logs:

    • Empty completion content (even though usage shows non-zero output/reasoning tokens)
    • Occasionally: finish_reason=None
  • When I try to constrain output:

    • Unsupported parameter: 'max_tokens' ... Use 'max_completion_tokens' instead.
  • When I tried Chat Completions with the same model:

    • Unsupported value: 'temperature' ... Only the default (1) value is supported.

Sample log lines (summarized):

[try N/48] Calling gpt-5 (Responses)...
[debug] finish_reason=None usage=... output_tokens_details(reasoning_tokens=768) ...
[warn] Empty completion content (attempt 1/2/3)
[fail] Responses error: Failed after retries: Empty completion content.

Minimal Repro (Responses API)

from openai import OpenAI
client = OpenAI()

resp = client.responses.create(
    model="gpt-5",
    input="Please write a short friendly greeting.",
    # I’ve tried with and without the following, and also renaming:
    # max_tokens=800,  # causes 400; API asks for max_completion_tokens
    max_completion_tokens=800,
    # temperature omitted (to avoid unsupported_value for gpt-5)
)

# I consistently get either empty text or structures with no usable content.
print(resp)

What works (Chat Completions)

If I switch to Chat and do not send temperature, and I pass max_completion_tokens (not max_tokens), I can get output:

from openai import OpenAI
client = OpenAI()

resp = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Please write a short friendly greeting."}
    ],
    max_completion_tokens=800,  # works
    # temperature not sent (gpt-5 only supports default=1)
)

print(resp.choices[0].message.content)  # Usually non-empty

Things I’ve tried

  • Swapped max_tokensmax_completion_tokens
  • Removed temperature (saw it’s unsupported for gpt-5)
  • Different prompts (very simple → structured)
  • Multiple retries with backoff (dozens of attempts)
  • Verified API key, project, and rate limits (no rate-limit errors)
  • Status page reported all green

Questions

  1. Is the Responses API currently supported for text-only generation with gpt-5?
    If yes, what is the exact parameter set (and schema) that should be used to reliably get non-empty text back?

  2. Are there content shapes in Responses (e.g., parts/segments) that require a different parsing approach to extract text for gpt-5? If so, could you share a minimal code snippet to robustly extract the text field?

  3. Are there model-specific constraints for gpt-5 under Responses (beyond max_completion_tokens and default temperature=1) that would explain the empty content/finish_reason=None patterns?

  4. Is there any official guidance on when to prefer Responses vs Chat Completions for gpt-5 if the goal is plain text output?

If needed, I can provide timestamps and sample request IDs via DM. Thanks a ton for any pointers!

1 Like

Welcome to the community.

Your responses example is missing the correct parameter (max_output_tokens), but other than that it works fine to me:

resp = client.responses.create(
    model="gpt-5",
    input="Please write a short friendly greeting.",
    max_output_tokens=800,
)
print(resp.output_text) # 'Hi there! Great to see you—how can I help today?'

The latest version is not 1.x. It shouldn’t matter unless you are using a really old version, but to make sure try running pip install --upgrade openai and the code below afterwards to make sure you got the correct interpreter running it (you might have more than one environment, so it doesn’t hurt to check):

import openai
print(openai.__version__) #2.6.1 is the latest version
1 Like

Thank you so much for your welcoming~~~

I ran your code and it worked well:

(venv-growth) yajin@MacBook-Air-3 generatemessage % python test.py

OpenAI SDK version: 2.7.1

Hi there! Hope you’re having a great day!

(venv-growth) yajin@MacBook-Air-3 generatemessage % python test.py

OpenAI SDK version: 2.7.1

Hi there! Hope you’re having a great day—how can I help you today?

But in my code, it still returned tuns of this:

[fail 1/48] Responses error: Incomplete output (discard and retry)

[sleep] Wait for 1.3 sec and retry…

Or this:

[fail 3/48] Responses error: Empty completion content (responses)

I worked on this error with ChatGPT for weeks…

I tried more…

When I enlarged the prompt string in your code, to 1200+ chars, it returns empty…

How I can do with large prompt text?

The main fault that you have is that you have set the max_output_tokens value far too low. It needs to be more like 10000, of a possible 128000.

“gpt-5” (along with o4-mini, o3, etc) are reasoning AI models. They produce internal tokens of thought that you do not receive, which are also billed as output. The maximum token setting is a budget of the maximum expense you will spend, seen or unseen, and it will terminate the AI text generation if hit.

Perhaps you need some parameter guidance - what you can send to each of the two “chat” endpoints, how parameters must be dropped based on model and model capabilities (along with your "ID verified status), and especially, where the input of the similar parameter differs.

'''Reference-quality API migration guide: Chat Completions <-> Responses
Extensively demonstrates parameters accepted or denied per endpoint or model
(comments are intentional for API documentation and alternate code use)'''

import os, json, httpx

# Desired developer parameters
common_body = {
    "model": "gpt-5-mini",
    "temperature": 1.0,  # reasoning: no
    "top_p": 0.5,  # reasoning: no
    "service_tier": "priority",  # "flex": only gpt-5, o3, or o4-mini
    "store": False,
    "prompt_cache_key": None,
    "safety_identifier": None,
    "parallel_tool_calls": False,
    "tools": [],    # functions: different shape between endpoints
    "tool_choice": "auto",
    "stream": False,
    "stream_options": {
        "include_usage": True,  # responses: no
        "include_obfuscation": False,
    },
    "stop": [],                 # responses: no, reasoning: no
    "frequency_penalty": 0.01,  # responses: no, reasoning: no
    "presence_penalty": 0.01,   # responses: no, reasoning: no
    "n": 1,                     # responses: no
    "logit_bias": {99999:-2},   # responses: no, reasoning: no
    "prediction": None,         # responses: no, reasoning: no (gpt-4o only parameter)
    "modalities": ["text"],     # responses: no, reasoning: no (+"audio")
    "audio": {"format": "mp3", "voice": "cedar"}, # responses: no, reasoning: no
}

# Developer variables where parameter placement depends on endpoint
max_completion_tokens = 4000
verbosity = "medium"  # other than "medium": gpt-5 reasoning only
top_logprobs = 0
reasoning_effort = "low"
response_format = {"type": "text"}
instructions = "A concise assistant provides brief answers."
user_message = """

Ping!

""".strip()

# Only a Responses API feature
reasoning_summary = "auto"  # "auto" | "detailed" | None
include_encrypted_content = True

# model gates
is_gpt5 = (
    common_body["model"].startswith("gpt-5")
    and not common_body["model"].startswith("gpt-5-chat")
)
is_reasoning = is_gpt5 or common_body["model"].startswith(("o3", "o4"))

if is_reasoning:
    common_body.pop("temperature", None)
    common_body.pop("top_p", None)
    common_body.pop("frequency_penalty", None)
    common_body.pop("presence_penalty", None)
    common_body.pop("logit_bias", None)
    common_body.pop("stop", None)
    common_body.pop("modalities", None)
    common_body.pop("logprobs", None)
    common_body.pop("top_logprobs", None)
    common_body.pop("audio", None)
    common_body.pop("prediction", None)

if not common_body.get("stream", False):
    common_body.pop("stream_options")

chatcompletions_body = {
    **common_body,
    "max_completion_tokens": max_completion_tokens,
    **({"reasoning_effort": reasoning_effort} if is_reasoning else {}),
    "response_format": response_format,
    **({"verbosity": verbosity} if is_gpt5 else {}),
    "logprobs": bool(top_logprobs),  # 0 = False
    **({"top_logprobs": top_logprobs} if top_logprobs else {}),  # 0 = don't send
    "messages": [
        {"role": "system", "content": instructions},
        {"role": "user", "content": user_message},
    ],
}

responses_body = {
    **common_body,
    "max_output_tokens": max_completion_tokens,
    **(
        {"reasoning": {"effort": reasoning_effort, "summary": reasoning_summary}}
        if is_reasoning
        else {}
    ),
    "include": [
        *(
            ["reasoning.encrypted_content"]
            if is_reasoning and include_encrypted_content
            else []
        ),
        # .. other include types not demonstrated
    ],
    "text": {
        "format": response_format,
        **({"verbosity": verbosity} if is_gpt5 else {}),
    },
    "top_logprobs": top_logprobs,  # 0 = disabled, or 1-20
    "instructions": instructions,
    "input": [{"type": "message", "role": "user", "content": user_message}],
}

# Only a Chat Completions API feature
# all not implemented on Responses
responses_body.pop("frequency_penalty", None)
responses_body.pop("presence_penalty", None)
responses_body.pop("logit_bias", None)
responses_body.pop("modalities", None)
responses_body.pop("audio", None)
responses_body.pop("n", None)
responses_body.pop("stop", None)
responses_body.pop("prediction", None)

# moved elsewhere or unnecessary on Responses, thus untolerated
responses_body.get("stream_options", {}).pop("include_usage", None)
responses_body.pop("max_tokens", None)
responses_body.pop("max_completion_tokens", None)
responses_body.pop("response_format", None)
responses_body.pop("messages", None)
responses_body.pop("verbosity", None)
responses_body.pop("reasoning_effort", None)
responses_body.pop("web_search_options", None)

# - API call formation

headers: dict[str, str] = {
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",
}

# --- Responses API call
resp: httpx.Response | None = None
rheaders: dict[str, str] | None = None
try:
    with httpx.Client(timeout=900) as client:
        resp = client.post(
            "https://api.openai.com/v1/responses",
            headers=headers,
            json=responses_body,
        )
        resp.raise_for_status()
        rheaders = dict(resp.headers)
except httpx.HTTPStatusError as e:
    print(f"Request failed: {e}")
    if e.response is not None:
        try:
            # print body error messages from OpenAI
            print("Error response body:\n", e.response.text)
            rheaders = dict(e.response.headers)
        except Exception:
            pass
    raise
except httpx.RequestError as e:
    print(f"Request error: {e}")
    raise

response = resp.json()
response["output_text"] = "".join(
    content_block["text"]
    for message in response.get("output", [])
    if message.get("type") == "message" and message.get("role") == "assistant"
    for content_block in message.get("content", [])
    if content_block.get("type") == "output_text" and "text" in content_block
)
# print(json.dumps(response.get("output"), indent=2))
print(response.get("output_text"))

# --- Chat Completions API call
resp = None
rheaders = None
try:
    with httpx.Client(timeout=600) as client:
        resp = client.post(
            "https://api.openai.com/v1/chat/completions",
            headers=headers,
            json=chatcompletions_body,
        )
        resp.raise_for_status()
        rheaders = dict(resp.headers)
except httpx.HTTPStatusError as e:
    print(f"Request failed: {e}")
    if e.response is not None:
        try:
            # print body error messages from OpenAI
            print("Error response body:\n", e.response.text)
            rheaders = dict(e.response.headers)
        except Exception:
            pass
    raise
except httpx.RequestError as e:
    print(f"Request error: {e}")
    raise
ccresponse = resp.json()
ccresponse["output_text"] = ccresponse["choices"][0]["message"]["content"]
#print(json.dumps(ccresponse.get("choices"), indent=2))
print(ccresponse.get("output_text"))

# reminder of SDK module usage
#import openai
#openai_client = openai.Client()
#r = openai_client.responses.create(**responses_body)
#r = openai_client.chat.completions.create(**chatcompletions_body)

This will run and both a Responses and Chat Completions call should succeed and print your response for a vast variety of parameter options.

The API Reference will let you get descriptions for each of the possible parameters.

Not even shown: structured output or tools. Not shown: streaming, async. Logprobs behave differently between models and endpoints also.

Thank you for supporting~ As suggested, I set max_output_tokens = 30000, with 500+ chars prompt, but still failed…

Code:

def call_responses_text_only(client: OpenAI, model: str, user_text: str, max_output_tokens: int = 30000) → str:
print(“call_responses_text_only:max_output_tokens =”, max_output_tokens)
print(“call_responses_text_only:user_text =”, user_text)
resp = client.responses.create(
model=model,
input=user_text,
max_output_tokens=max_output_tokens,
reasoning={“effort”: “minimal”},
)

Output:

(venv-growth) trainer@MacBook-Air-3 ~ % /Users/trainer/Projects/generatemessage/generate_message.command ; exit;

[try 1/48] 正在调用 gpt-5 (Responses)…

call_responses_text_only:max_output_tokens = 30000

call_responses_text_only:user_text = 你是非暴力沟通社区的主理人,是个资深的非暴力沟通培训师。

words.txt中的文本是社区某位伙伴所记录的成长足迹帖子内容。为了增进社区互动,你将生成一段文本,邀请另外一位伙伴对这个帖子内容做评论。这个文本不是直接对帖子的评论。

文本内容描述:

1.这是一个对对方的实践邀请,在评论的同时练习实践非暴力沟通;

2.非常简单描述帖子内容(让对方连接当下的互动,增加连接感);

3.文本紧贴帖子内容,围绕对方,帖子作者,帖子的内容展开;

4.邀请中请考虑context.txt中描述的上下文信息(请具体关联帖子内容),其中Must部分是必须考虑的上下文,Optional部分则是可选参考;

5.语言尽可能自然、口语化;

6.在文本中如果提到作者,请用“Ta”,方便我做后续处理;

7.无须“嗨”、“你好”之类的问候语。

[context]

Must:

当前主题月主题为:倾听

Optional:

[content]

[FJz6h6atLtnFgjBgRXLo7Z0r5Y6bftfleXGfhqgm.jpeg]当情绪像暴风雨来临时,这两天我试着意象它们。昨天突然意象到它们就像一把把的匕首,我吓了一跳,这样的能量往外发泄攻击性会多大呀,后来我就把这些意象画在纸上,好像消散了一些。刚刚感觉到这股情绪变成一团火,刚好看到今天的果汁就把它记下来了。

我的感受是烦,需要是边界。

[style]

请写给社区其他一位成员的邀请文本;非常简要提及帖子核心内容以建立连接;清晰点出本月主题;邀请对方在成长足迹中评论互动;语气温柔简洁;一段话即可。

[format]

请将最终可直接发布的一段邀请文本输出,务必以<<>>开头、以<<>>结尾,不要输出任何额外说明、标题或引号。

[fail 1/48] Responses 错误: Incomplete output (discard and retry)

\[sleep\] 等待 1.1 秒后重试...

Your shown function doesn’t return the string that it promises. Maybe there is more function?

Another thing that is important to understand is the output shape of the responses object. The “output” field is an array (a Python list). With reasoning AI models, the first object item that you receive will most often not be the output content for display, but instead, will be a place for a reasoning summary or encrypted reasoning for self-management. There is even an object type for “refusal”, where the AI won’t do the job asked of it.

There is a helper with the OpenAI library, providing response.output_text, that will gather user-facing text produced by the AI model. You will quickly find that this is inadequate when you actually want to see more of what the AI model is thinking about and doing. More processing of the original “output” item list is needed to handle functions, to handle refusals, and to present the reasoning summaries that can be shown in a user interface.

My long code sample (which does not use OpenAI’s code), has its own version of this Responses API response collector that also adds “output_text” to the response dictionary.

I see in your first code you print the whole resp object, which should be enough for you to determine that the AI made no text for you to read. You can also read the “usage” there, where you have a “output_tokens”, and also under details, the part of that which is the internal “reasoning tokens”. Understanding where tokens were generated will help you understand why a transition was never made to output for you to see.

Then: the prompt. I can’t read Chinese, but gpt-5-pro can. I had it break down the correct format for an automated task like you are giving, so there are both “instructions” (a system message), and then the task is clearly-separated from the text that is being processed. This is an API call that will succeed.

from openai import OpenAI
client = OpenAI()

instructions = r"""
你是一名自动化处理器,本次工作没有实时对话的用户。你的长期身份:非暴力沟通(NVC)社区的主理人,且是资深的 NVC 培训师。你的唯一职责是:接收来自 user 角色的任务说明,读取其中提供的输入文本与上下文,并严格按任务要求生成可直接发布的最终产出。

通用规则:
1. 输出语言:简体中文。
2. 不进行寒暄、解释、模板占位或元评论;不复述任务;不添加标题、注释或额外标记。
3. 当任务要求生成“邀请文本”时,你必须:
   - 仅写一段话;
   - 以 <<>> 开头,并以 <<>> 结尾;
   - 语言自然、口语化、温柔且简洁;
   - 紧贴帖文内容,围绕受邀者、帖子作者(若提及作者请称为“Ta”)与帖文展开;
   - 这不是对帖子的直接评论,而是邀请另一位成员在「成长足迹」中发表评论并练习 NVC;
   - 非常简要提及帖子的核心内容以建立连接;
   - 必须结合 [context] 中的 Must 信息(如当月主题),Optional 信息可酌情参考;
   - 不使用“嗨”“你好”等问候语。
4. 仅输出目标文本,不输出任何额外内容。
""".strip()

user_task = r"""
任务:依据下方提供的 [context] 与 [content](原文照录),生成一段面向社区另一位成员的“邀请文本”。请严格遵循以下要求:
- 该文本是对对方的实践邀请,邀请对方在评论时练习非暴力沟通;
- 非常简要提及帖子的核心内容以帮助建立连接;
- 紧贴帖子内容,围绕受邀者、帖子作者(称“Ta”)与帖文展开;
- 必须结合 [context] 里的 Must 信息(尤其是当月主题),Optional 可参考但非必需;
- 语言尽可能自然、口语化,语气温柔、简洁;
- 不要包含任何问候语(如“嗨”“你好”等);
- 只输出一段话,且以 <<>> 开头、以 <<>> 结尾,不要输出任何额外说明、标题或引号。
""".strip()

user_content = r"""
[context]

Must:

当前主题月主题为:倾听

Optional:

[content]

[FJz6h6atLtnFgjBgRXLo7Z0r5Y6bftfleXGfhqgm.jpeg]当情绪像暴风雨来临时,这两天我试着意象它们。昨天突然意象到它们就像一把把的匕首,我吓了一跳,这样的能量往外发泄攻击性会多大呀,后来我就把这些意象画在纸上,好像消散了一些。刚刚感觉到这股情绪变成一团火,刚好看到今天的果汁就把它记下来了。

我的感受是烦,需要是边界。
""".strip()

user_message_string = f'{user_task}\n\n---\n\n"""\n{user_content}\n"""'

response = client.responses.create(
    model="gpt-5",
    max_output_tokens=8000,  # 必须同时为内部推理与输出预留预算
    store=False,
    reasoning={"effort": "low"},
    instructions=instructions,
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {
                  "type": "input_text",
                  "text": user_message_string,
                }
            ]
        }
    ],
)
assistant = response.output_text
print(assistant)
print(response.usage.model_dump())

You will see how there is clear separation between the instruction, the task, and the content to be processed. In English:

instructions = r"""
You are an automated processor, with no user to chat with.
Perform the task and produce the processed output without additional "chat" discussion.
""".strip()

user_task = r"""

Task: Please translate this text to Spanish language.

""".strip()

user_content = r"""

Hello friends, it's nice to see you today!

""".strip()

user_message_string = f'{user_task}\n\n---\n\n"""\n{user_content}\n"""'

The model that is most likely to still fail and produce no output text is gpt-5-mini. This symptom has been often reported. You can retry the whole job of an API call on an AI model such as gpt-4.1 or o4-mini if you receive no output but no errors.

If you are processing unknown text, such as from a forum, it is important to run the same input on the “moderations” API first, to ensure it is not flagged. Using language AI models alone as a moderator for bad content can result in an organization ban.

1 Like

Thanks a lot~~ You are correct! The code that checks incompleteness has problem! Now the problem resolved.

1 Like