Conversation items not added when using streamed Response

pawelgc · October 15, 2025, 11:03am

Hi everyone,

I’m currently migrating from the deprecated Assistants API to the new Responses/Conversations API.

I noticed that when running a streamed Response, the assistant’s replies are not automatically appended to the Conversation. I came across another post mentioning that this is a known issue, so I’ve implemented a temporary workaround on my side, but it’s not an ideal solution.

Is this issue currently being worked on, or is there an official fix/ETA planned?

Any insight or confirmation would be greatly appreciated.

Thanks in advance!

Edit:
I’ve noticed that the assistant’s response is never added to the conversation automatically on its own. But when I manually try to add the response as a conversation item, the behavior is inconsistent:

Sometimes I get a lock error, and checking the logs shows that the API added the assistant’s message right before my insert.
Other times there’s no lock error, and the API doesn’t add the Response, it only adds my item.

So the auto-append only happens sometimes, and only when I try to insert the response myself, which makes it unpredictable and tricky to handle cleanly.

Please can I get an update on whether this is a known issue, being worked on, or if there’s any ETA for a fix?

Last edit:
I found a solution to a problem i caused myself… due to doing a early return the response didnt add itself to the conversation, removing the early return fixed it!

_j · October 15, 2025, 12:16pm

You will still need to use store:true with a request.

I know this makes no sense.

See if then the conversation is updated with the API call.

pawelgc · October 15, 2025, 2:37pm

I tried it with store:true, but the conversation still doesn’t update after the API call. The response is returned correctly, but no new items are added.

pawelgc · October 16, 2025, 10:52am

I notice that sometimes the response is added when running it, but not always, not sure why tho.

_j · October 16, 2025, 5:18pm

I have spent several hours writing code, revising code, having GPT-5 analyze the code vs yaml specification and documentation, specifically starting from the point of ‘how to use conversations’, to then ‘how to diagnose conversations not being updated’, ‘how to diagnose response.id never being created with store:true’, ‘how to send multiple methods of input and also multiple ways the API reference has of passing conversation ID’, then background threads to delay and persist in trying to retrieve a conversation ID contents and response ID, I can fully conclude:

The responses API is extremely broken and not suitable for use.

Let’s just have a nice chat with the AI:

Conversation created conv_68f118865dc481959fae7ef8049de6ae01a95a333d512910

[assistant:] Hi — I’m ChatGPT, an AI assistant built by OpenAI. I can help with writing, editing, coding, brainstorming, research summaries, math, language translation, explanations, planning, troubleshooting, and more. I work best with clear instructions and examples.

A few quick notes:

I don’t have real-time web access or personal memory across sessions unless you provide context.

I can generate code, drafts, and suggestions, but always review for accuracy, safety, and legal/compliance needs.

Tell me the goal, constraints, and any examples, and I’ll get to work.

What would you like help with right now?
–Usage-- in/cached: 24/0; out/reasoning:136/0

Prompt (or ‘exit’): tell me more about quick note item 1 - what would be needed for that?
Response ID deleted: resp_01a95a333d5129100068f11886fc288195866bbf3504362e55
[assistant:]
I don’t have the quick note you’re referring to. Can you paste item 1 (or describe it) so I can give targeted details?

If helpful, when you share it I can outline:

Goals and success criteria

Required people/roles and skills

Tools, software and materials

Step-by-step tasks and timeline

Estimated cost and effort

Risks and dependencies

Quick checklist to get started

Tell me the actual item and how detailed you want the plan (high-level vs. task-level).

[warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 39/0; out/reasoning:178/64

Prompt (or ‘exit’):

The AI doesn’t understand what I am talking about from the prior turn. The conversation ID is not updated with any new items after storing its prior state and checking again.

I switch to the ‘object’ type of conversation ID, and place the input as a full array of messages with ‘type:message’. Same thing, no conversation history, and also intermittent failures to even create a response ID (where I then try to delete this completely unwanted artifact in the background)

Conversation created conv_68f11eddfa5481978ea68321fe15eaea0de2f8b4132ea99b

[assistant:] Hi — I’m ChatGPT, an AI assistant built by OpenAI. I can:

Answer questions and explain things clearly

Help draft emails, essays, code, summaries, plans, and creative writing

Analyze images you upload and provide feedback

Solve technical problems, debug code, and generate examples

Translate and work in multiple languages

Quick notes and limits:

My knowledge goes up to June 2024; I can’t browse the web or access real-time info.

I don’t retain personal data between conversations unless you provide context within the chat.

I can be helpful with many tasks, but verify critical facts (medical, legal, financial) with a qualified professional.

How can I help you today?
–Usage-- in/cached: 24/0; out/reasoning:155/0

Prompt (or ‘exit’): [warn] delete_response resp_0de2f8b4132ea99b0068f11ede85048197bf6ca449d85cd7ee not deleted after 2 attempt(s): HTTP 404: Response with id 'resp_0de2f8b4132ea99b0068f11ede85048197bf6ca449d85cd7ee' not found.tell me how you would do item number three you list.

[assistant:] I don’t have the list you’re referring to. Could you either paste the list here or tell me what item three is?

If you want a quick template for how I’d explain doing “item three,” here’s the format I’ll use once I know it:

Goal: one-sentence description of the outcome.

Inputs needed: data, tools, permissions, or constraints.

Step-by-step actions: numbered, practical steps to complete it.

Time estimate: rough effort/time required.

Risks or pitfalls: what can go wrong and how to avoid it.

Deliverable: what I’d produce and how I’d present it.

Share item three (or the list) and I’ll fill that in.
[warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 34/0; out/reasoning:217/64

Prompt (or ‘exit’): just repeat what you said before.
[warn] delete_response resp_0de2f8b4132ea99b0068f11f0e717881978288d14ef728e844 not deleted after 2 attempt(s): HTTP 404: Response with id 'resp_0de2f8b4132ea99b0068f11f0e717881978288d14ef728e844' not found.
[assistant:]
A helpful AI that keeps its answers brief yet insightful.
[warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 29/0; out/reasoning:17/0

Besides in the last turn the AI repeating back the developer message as “what it said before”, showing nothing calling the AI model “ChatGPT” yet it calling itself ChatGPT:

more failures: conversation ID never created to be deleted, you can see from the input token report that the conversation length is not growing, and asserting against the previous call to GET https://api.openai.com/v1/responses/{response_id}/input_items is identical, with 0 items stored.

Even if this endpoint did rely on persisting a response ID to later get a conversation, the response ID storage itself is broken. That is the same as reported days ago, with someone trying to reuse a response ID as the chat history mechanism:

Despite any “failure to delete” in this code, in all this experimentation, not a single response ID was persisted in the logs or leaking past my cleanup.

And if you think that any errors are because of deleting a response id before it can be consumed: this works just fine without streaming, and setting my deleter simply to return True and the AI still doesn’t know jack about prior turns and the input can become smaller:

Tell me which specific wording you meant and I’ll explain more.
[warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 36/0; out/reasoning:218/64

Prompt (or ‘exit’): hi!

[assistant:] Hi — how can I help you today?
[warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 24/0; out/reasoning:15/0

I got a few “seemed to persist” early in testing, but then I can’t even replicate that.

And yes, “store”: true is hard-coded right in the payload construction function, alongside alternate “input” and “conversation”.

All tests were made against gpt-5-mini

I can only conclude that the low number of reports for this issue on either ‘conversations’ endpoint or ‘response id’ storage is that storing conversation server-side is for noobs and no experienced developer would do such a thing with their user data and rely on such stateful information being persisted by a party with an “off” button for an organization they think misbehaves or doesn’t prepay.
Then, secondly, that programming Responses streaming event handling with literally dozens of events to handle, and then documentation so poor that even trying to parse the “error” event is completely wrong, that only foolhardy experienced developers would undertake ever using responses.
Finally, that OpenAI gated streaming itself as something denied until you go through “ID Verification” that is a complete failure of an implementation besides a massive intrusion - obviously there is no “trust” needed for receiving streaming, nor to make an ‘agent’ or request a ‘reasoning summary’. ID verification is forced on these methods purely to force it on developers and reap whatever profit motivates sending users to “withpersona”.

That the API for this was launched in a broken state is here:

That it was temporarily working is here, with my assurance:

So please, Andrew @wilkes - explain what’s gone wrong.

650 lines of a 'basic' API chatbot for streaming Responses with Conversation

"""OpenAI Conversations + Responses: server-side chat memory demo
One Conversation per run; server keeps history across turns.
Default: streaming via SSE; prints deltas; usage on completion.
Toggle non-streaming with USE_STREAMING.
instructions=SYSTEM sent per request (Conversation minimal).
Payload built by build_responses_payload; caller sets stream.
Persisted response (store:true) deleted after completion.
Lockfile aids cleanup; Conversation deleted on exit.
----------------------------------------------------"""
import os
import sys
import httpx

MODEL: str = "gpt-5-mini"
MAX_OUTPUT_TOKENS: int = 10000
SYSTEM = f"""
A helpful AI that keeps its answers brief yet insightful.
"""
HEADERS: dict[str, str] = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY', '')}",
}

# Developer toggle: choose streaming vs non-streaming path
USE_STREAMING: bool = True

data:dict = {}  # global for examination of non-stream response in REPL environment
conversation_items_state: dict | None = None  # last fetched conversation items list for diagnostics
print("__doc__: "+__doc__ or "")

def create_conversation() -> str:
    """
    Create a conversation containing one developer message that sets the tone.
    If a 'conversation.lock' exists, delete that conversation first (best-effort),
    then proceed. The new conversation id is written to 'conversation.lock'.
    Returns the server-generated conversation ID.
    """
    from pathlib import Path

    lock_path = Path("conversation.lock")

    # Best-effort cleanup of an orphan from a previous run
    if lock_path.exists():
        try:
            prev_id = lock_path.read_text(encoding="utf-8").strip()
        except Exception:
            prev_id = ""
        if prev_id:
            delete_conversation(prev_id)
        else:
            try:
                lock_path.unlink(missing_ok=True)
            except Exception:
                pass

    payload = {"metadata": {"topic": "demo"}}

    with httpx.Client(timeout=20) as client:
        response = client.post(
            "https://api.openai.com/v1/conversations",
            headers=HEADERS,
            json=payload,
        )
        response.raise_for_status()
        conversation_id: str = response.json()["id"]

    # Record the active conversation id for crash/restart cleanup
    try:
        lock_path.write_text(conversation_id, encoding="utf-8")
    except Exception:
        pass

    print(f"Conversation created {conversation_id}")
    return conversation_id

def delete_conversation(conversation_id: str) -> None:
    """
    Delete the conversation so the demo doesn’t leave stray server objects.
    On 2xx or 404, it's treated as success. Errors are logged to stderr.
    Always removes 'conversation.lock'. If a module-level `conversation_id`
    matches the deleted id, it is cleared to None.
    """
    import sys
    from pathlib import Path

    lock_path = Path("conversation.lock")
    cleared = False

    try:
        with httpx.Client(timeout=20) as client:
            response = client.delete(
                f"https://api.openai.com/v1/conversations/{conversation_id}",
                headers=HEADERS,
            )

        try:
            response.raise_for_status()
        except httpx.HTTPStatusError as exc:
            if exc.response is not None and exc.response.status_code == 404:
                print(f"Conversation already deleted {conversation_id}")
                cleared = True
            else:
                raise
        else:
            print(f"Conversation deleted {conversation_id}")
            cleared = True

    except Exception as exc:
        print(f"[warn] Couldn’t delete {conversation_id}: {exc}", file=sys.stderr)

    finally:
        # Always remove the lock file
        try:
            lock_path.unlink(missing_ok=True)
        except Exception:
            pass

def get_conversation_items(conversation_id: str, limit: int = 100) -> dict | None:
    """
    Retrieve up to `limit` most recent items from a conversation for diagnostics.
    Returns the JSON object (dict) on success, or None on error.
    """
    import sys

    try:
        with httpx.Client(timeout=20) as client:
            resp = client.get(
                f"https://api.openai.com/v1/conversations/{conversation_id}/items",
                headers=HEADERS,
                params={"limit": limit},
            )
            resp.raise_for_status()
            return resp.json()

    except httpx.HTTPStatusError as exc:
        r = exc.response
        request_id = r.headers.get("x-request-id") if r is not None else None
        status = r.status_code if r is not None else "unknown"
        print(
            f"[warn] get_conversation_items HTTP {status} (x-request-id={request_id})",
            file=sys.stderr,
        )
        try:
            import json
            msg = (json.loads(r.text).get("error", {}).get("message")) if r is not None else str(exc)
        except Exception:
            msg = r.text if r is not None else str(exc)
        while "****" in msg:
            msg = msg.replace("****", "***")
        print(f"[warn] message: {msg}", file=sys.stderr)
        return None

    except httpx.RequestError as exc:
        print(f"[warn] get_conversation_items request error: {exc}", file=sys.stderr)
        return None

def delete_response(
    response_id: str,
    *,
    retries: int = 1,
    delay_first: float = 0.0,
    retry_delay: float = 2.0,
    timeout: float = 20.0,
) -> bool:
    """
    Attempt to delete a persisted Responses API object.
    Returns True on 2xx; warns once if no attempt succeeded.

    Retries help with eventual consistency when store:true objects lag before deletion.
    On non-2xx, extracts JSON error.message if present, otherwise prints status and raw text.
    """
    import sys
    import time
    #return True  # shut it off temporarily
    if not response_id:
        return False

    if delay_first > 0:
        try:
            time.sleep(delay_first)
        except Exception:
            pass

    url = f"https://api.openai.com/v1/responses/{response_id}"
    last_code: int | None = None
    last_msg: str | None = None

    try:
        with httpx.Client(timeout=timeout) as client:
            attempts = retries + 1
            for i in range(attempts):
                try:
                    resp = client.delete(url, headers=HEADERS)
                    status = resp.status_code
                    if 200 <= status < 300:
                        print(f"Response ID deleted: {response_id}")
                        return True
                    else:
                        try:
                            msg = resp.json().get("error", {}).get("message") or resp.text
                        except Exception:
                            msg = resp.text
                        last_code = status
                        last_msg = msg
                except httpx.RequestError as exc:
                    last_code = None
                    last_msg = str(exc)

                if i < attempts - 1:
                    try:
                        time.sleep(retry_delay)
                    except Exception:
                        pass

    except Exception as exc:
        print(f"[warn] delete_response {response_id} unexpected error: {exc}", file=sys.stderr)
        return False

    # No attempt succeeded; issue one concise warning.
    detail = (
        f"HTTP {last_code}: {last_msg}" if last_code is not None else (last_msg or "request error")
    )
    print(
        f"[warn] delete_response {response_id} not deleted after {retries + 1} attempt(s): {detail}",
        file=sys.stderr,
    )
    return False

def schedule_delete_response(
    response_id: str,
    *,
    delay: float = 3.0,
    retries: int = 1,
    retry_delay: float = 2.0,
) -> None:
    """
    Fire-and-forget deletion scheduled in the background.
    Sleeps 'delay' seconds, then calls delete_response() with retry behavior.
    Emits a single warning later if deletion never succeeded.
    """
    import threading

    if not response_id:
        return

    def _worker() -> None:
        delete_response(
            response_id,
            delay_first=delay,
            retries=retries,
            retry_delay=retry_delay,
        )

    t = threading.Thread(target=_worker, name=f"delete_response:{response_id}", daemon=True)
    t.start()

def build_responses_payload(
    conversation_id: str,
    user_input: str | list[dict] | dict,
    model: str,
    max_out: int | None,
    stream: bool,
    *,
    instructions: str = SYSTEM,        # system prompt per call
    temperature: float = 0.5,          # only for non-reasoning
    top_p: float = 0.9,                # only for non-reasoning
    reasoning_effort: str = "low",     # only for reasoning
    reasoning_summary: str | None = "auto",  # only for reasoning
    verbosity: str = "medium",         # only for gpt-5 family
    **kwargs,                          # passthrough for other valid fields
) -> dict[str, object]:
    """
    Build a minimal Responses API request body for a chat with Conversations,
    applying model-appropriate gating of sampling vs reasoning vs verbosity.

    Model gates:
      - is_gpt5     = model.startswith("gpt-5") and not model.startswith("gpt-5-chat")
      - is_reasoning = is_gpt5 or model.startswith(("o3", "o4"))

    * reasoning models receive a 'reasoning' block (no temperature/top_p).
    * non-reasoning models receive temperature & top_p (no 'reasoning').
    * gpt-5 family models receive text.verbosity; others do not.
    * any extra kwargs are merged in at the end (developer-controlled).
    """
    is_gpt5 = model.startswith("gpt-5") and not model.startswith("gpt-5-chat")
    is_reasoning = is_gpt5 or model.startswith(("o3", "o4"))

    body: dict[str, object] = {
        "model": model,
        #"conversation": conversation_id,
        "conversation": {"id": conversation_id},  # alternate format also documented
        "instructions": instructions,
        #"input": user_input,
        "input": [
            {
                "role": "user",
                "content": [
                  {"type": "input_text", "text": user_input},
                ]
            }
        ], 
        "max_output_tokens": max_out,
        "store": True,
        "stream": stream,
        "text": {"format": {"type": "text"}},
    }

    if is_gpt5:
        body["text"]["verbosity"] = verbosity

    if is_reasoning:
        reasoning: dict[str, object] = {"effort": reasoning_effort}
        if reasoning_summary is not None:
            reasoning["summary"] = reasoning_summary
        body["reasoning"] = reasoning
    else:
        # sampling knobs only for non-reasoning
        body["temperature"] = temperature
        body["top_p"] = top_p

    # merge in any other valid Responses parameters (developer's responsibility)
    body.update(kwargs)
    return body

def non_stream_response(
        conversation_id: str,
        user_input: str,
        model: str,
        max_out: [int|None] = None,  # Look at app's global at script top
) -> str:
    """
    Send user_input as the next turn and return the assistant’s reply text.
    The same conversation ID is reused, so the server retains memory.
    """
    global data

    payload = build_responses_payload(
        conversation_id=conversation_id,
        user_input=user_input,
        model=model,
        max_out=max_out,
        stream=False,
    )

    try:
        with httpx.Client(timeout=600) as client:
            response = client.post(
                "https://api.openai.com/v1/responses",
                headers=HEADERS,
                json=payload,
            )
            response.raise_for_status()
            data = response.json()
            delete_response(data.get("id"))
            print(
              f"--Usage--  in/cached: {data['usage']['input_tokens']}/"
              f"{data['usage']['input_tokens_details']['cached_tokens']};  "
              f"out/reasoning:{data['usage']['output_tokens']}/"
              f"{data['usage']['output_tokens_details']['reasoning_tokens']}"
            )

    except httpx.HTTPStatusError as exc:
        resp = exc.response  # httpx attaches the response to the exception
        request_id = resp.headers.get("x-request-id") if resp is not None else None
        status = resp.status_code if resp is not None else "unknown"
        print(f"header x-request-id: {request_id}\nHTTP status {status} error", file=sys.stderr)
        try:
            import json
            err_text = (
                (json.loads(resp.text).get("error", {}).get("message"))
                if resp is not None else str(exc)
            )
        except Exception:
            err_text = resp.text if resp is not None else str(exc)
        while "****" in err_text:
            err_text = err_text.replace("****", "***")
        print(f"message: {err_text}", file=sys.stderr)
        raise  # propagate; execution cannot safely continue

    except httpx.RequestError as exc:
        print(f"Request error: {exc}", file=sys.stderr)
        raise

    reply_fragments: list[str] = [
        chunk.get("text", "")
        for event in data.get("output", [])
        for chunk in event.get("content", [])
        if chunk.get("type") == "output_text"
    ]

    return "".join(reply_fragments).strip()

def iter_sse_events(resp) -> tuple[str, dict]:
    """
    Minimal SSE iterator for httpx streamed responses.
    Yields (event_type, payload_dict). Unknown or malformed data lines are skipped.
    """
    import json

    current_event: str | None = None
    data_lines: list[str] = []

    for raw_line in resp.iter_lines():
        line = raw_line.strip()

        if not line:
            if current_event and data_lines:
                blob = "\n".join(data_lines)
                try:
                    payload = json.loads(blob)
                except json.JSONDecodeError:
                    payload = None
                if isinstance(payload, dict):
                    etype = payload.get("type") or current_event
                    if isinstance(etype, str):
                        yield etype, payload
            current_event = None
            data_lines.clear()
            continue

        if line.startswith("event:"):
            current_event = line[len("event:"):].strip()
        elif line.startswith("data:"):
            data_lines.append(line[len("data:"):].strip())
        else:
            # Ignore id:, retry:, comments, etc.
            pass

def _poll_conversation_items_until_changed(
    conversation_id: str,
    prev_snapshot: dict | None,
    *,
    limit: int = 100,
    tries: int = 3,
    sleep_s: float = 0.75,
) -> dict | None:
    """
    Poll conversation items briefly after completion to avoid false 'unchanged' warnings.
    Returns the latest items (or None if fetch failed). Stops early if a change is observed.
    """
    import time

    latest: dict | None = None
    for attempt in range(tries):
        latest = get_conversation_items(conversation_id, limit=limit)
        if latest is None:
            # Error fetching; do not keep hammering.
            break
        if prev_snapshot is None:
            break
        if latest != prev_snapshot:
            break
        if attempt < tries - 1:
            time.sleep(sleep_s)
    return latest

def handle_response_event(event_type: str, evt: dict, state: dict) -> None:
    """
    Handle one parsed Responses API event. Only the basic text streaming path
    is demonstrated; the structure is easy to extend for tool calls and more.
    """
    import sys

    if event_type == "response.created":
        resp_obj = evt.get("response") or {}
        rid = resp_obj.get("id")
        if rid:
            state["response_id"] = rid
        return

    if event_type == "response.output_text.delta":
        delta = evt.get("delta", "")
        if isinstance(delta, str) and delta:
            print(delta, end="", flush=True)
            state["assembled_text"].append(delta)
            state["delta_chunk_count"] += 1
            state["printed_any"] = True
        return

    if event_type == "response.completed":
        resp_obj = evt.get("response") or {}
        state["final_response"] = resp_obj
        state["usage"] = resp_obj.get("usage") or {}
        rid = resp_obj.get("id")
        if rid:
            state["response_id"] = rid
        state["completed"] = True
        return

    # Anticipated extensions (not implemented here):
    # - response.queued / response.in_progress
    # - response.reasoning_summary_part.added / ...text.delta / ...part.done
    # - response.tool_call.* and response.tool_result.*
    # - response.file_search_call.*
    # - response.error and *.error
    if event_type == "response.error" or event_type.endswith(".error"):
        msg = evt.get("error", {}).get("message") or evt.get("message") or "unknown error"
        print(f"\n[stream-error] {event_type}: {msg}", file=sys.stderr)
        state["completed"] = True
        return

    # Drop other events silently per demo scope.
    return

def stream_response(
        conversation_id: str,
        user_input: str,
        model: str,
        max_out: [int|None] = None,
) -> None:
    """
    Stream the next assistant turn. Prints text as response.output_text.delta arrives.
    After streaming, briefly polls for conversation item updates to reduce false warnings.
    Schedules persisted response deletion after a short delay, and reports final usage.
    """
    global data, conversation_items_state

    payload = build_responses_payload(
        conversation_id=conversation_id,
        user_input=user_input,
        model=model,
        max_out=max_out,
        stream=True,
    )

    state: dict[str, object] = {
        "assembled_text": [],
        "delta_chunk_count": 0,
        "printed_any": False,
        "response_id": None,
        "usage": None,
        "final_response": None,
        "completed": False,
    }

    try:
        with httpx.Client(timeout=600) as client:
            # Be explicit about SSE
            sse_headers = {**HEADERS, "Accept": "text/event-stream"}
            with client.stream(
                "POST",
                "https://api.openai.com/v1/responses",
                headers=sse_headers,
                json=payload,
            ) as resp:
                resp.raise_for_status()

                for event_type, evt in iter_sse_events(resp):
                    handle_response_event(event_type, evt, state)
                    if state["completed"]:
                        break

        # Ensure the next prompt starts on a new line
        joined = "".join(state["assembled_text"])
        if state["printed_any"] and not joined.endswith("\n"):
            print()

        # Briefly poll for conversation updates to avoid false "unchanged" warnings.
        latest_items = _poll_conversation_items_until_changed(
            conversation_id,
            conversation_items_state,
            limit=100,
            tries=3,
            sleep_s=0.75,
        )
        if latest_items is not None:
            if conversation_items_state is not None and latest_items == conversation_items_state:
                import sys as _sys
                prev_count = len(conversation_items_state.get("data", [])) if isinstance(conversation_items_state, dict) else 0
                print(
                    f"[warn] Conversation items unchanged after streaming turn; item count remains {prev_count}.",
                    file=_sys.stderr,
                )
            conversation_items_state = latest_items

        # Schedule deletion to happen shortly after completion.
        rid = state.get("response_id")
        if isinstance(rid, str) and rid:
            schedule_delete_response(rid, delay=5.0, retries=1, retry_delay=2.0)

        # Report usage
        usage = state["usage"] or {}
        in_tokens = usage.get("input_tokens", 0)
        in_cached = (usage.get("input_tokens_details") or {}).get("cached_tokens", 0)
        out_tokens = usage.get("output_tokens", 0)
        reasoning_tokens = (usage.get("output_tokens_details") or {}).get("reasoning_tokens", 0)
        print(
            f"--Usage--  in/cached: {in_tokens}/{in_cached};  "
            f"out/reasoning:{out_tokens}/{reasoning_tokens}"
        )

        data = state["final_response"] or {}

    except httpx.HTTPStatusError as exc:
        resp = exc.response
        request_id = resp.headers.get("x-request-id") if resp is not None else None
        status = resp.status_code if resp is not None else "unknown"
        print(f"header x-request-id: {request_id}\nHTTP status {status} error", file=sys.stderr)
        try:
            import json as _json
            err_text = (
                (_json.loads(resp.text).get("error", {}).get("message"))
                if resp is not None
                else str(exc)
            )
        except Exception:
            err_text = resp.text if resp is not None else str(exc)
        while "****" in err_text:
            err_text = err_text.replace("****", "***")
        print(f"message: {err_text}", file=sys.stderr)
        raise

    except httpx.RequestError as exc:
        print(f"Request error: {exc}", file=sys.stderr)
        raise

def main() -> None:
    conversation_id: str | None = create_conversation()
    prompt: str = "Introduce yourself"

    try:
        for _ in range(20):
            if USE_STREAMING:
                # Streamed path: prints as tokens arrive and reports usage internally.
                print("\n[assistant:] ", end="", flush=True)
                stream_response(
                    conversation_id,
                    prompt,
                    model=MODEL,
                    max_out=MAX_OUTPUT_TOKENS,
                )
            else:
                # Non-streamed path: returns the full assistant reply string.
                assistant_reply: str = non_stream_response(
                    conversation_id,
                    prompt,
                    model=MODEL,
                    max_out=MAX_OUTPUT_TOKENS,
                )
                print(f"\n-[assistant:] {assistant_reply}")

            prompt = input("\nPrompt (or 'exit'): ").strip()
            if prompt.lower() == "exit":
                break

    except KeyboardInterrupt:
        print("\n[ctrl-c] Exiting…")

    finally:
        delete_conversation(conversation_id)

if __name__ == "__main__":
    main()

Topic		Replies	Views
How can I delete or entirely overwrite the 16 key metadata on a conversation? API	5	125	September 25, 2025
How to enable chatbot ask first API gpt-4 , gpt-35-turbo , chatgpt , fine-tuning , api	15	3030	January 1, 2025
Responses API: streaming bugs? Bugs	5	572	June 9, 2025
Streaming is now available in the Assistants API! API api , assistants , assistants-api , assistants-streaming	54	27524	September 20, 2024
Auto tool call streaming differentiation is unintuitive Feedback api	3	196	May 19, 2025

Conversation items not added when using streamed Response

The responses API is extremely broken and not suitable for use.

Related topics