I’m currently migrating from the deprecated Assistants API to the new Responses/Conversations API.
I noticed that when running a streamed Response, the assistant’s replies are not automatically appended to the Conversation. I came across another post mentioning that this is a known issue, so I’ve implemented a temporary workaround on my side, but it’s not an ideal solution.
Is this issue currently being worked on, or is there an official fix/ETA planned?
Any insight or confirmation would be greatly appreciated.
Thanks in advance!
Edit:
I’ve noticed that the assistant’s response is never added to the conversation automatically on its own. But when I manually try to add the response as a conversation item, the behavior is inconsistent:
Sometimes I get a lock error, and checking the logs shows that the API added the assistant’s message right before my insert.
Other times there’s no lock error, and the API doesn’t add the Response, it only adds my item.
So the auto-append only happens sometimes, and only when I try to insert the response myself, which makes it unpredictable and tricky to handle cleanly.
Please can I get an update on whether this is a known issue, being worked on, or if there’s any ETA for a fix?
Last edit:
I found a solution to a problem i caused myself… due to doing a early return the response didnt add itself to the conversation, removing the early return fixed it!
I tried it with store:true, but the conversation still doesn’t update after the API call. The response is returned correctly, but no new items are added.
I have spent several hours writing code, revising code, having GPT-5 analyze the code vs yaml specification and documentation, specifically starting from the point of ‘how to use conversations’, to then ‘how to diagnose conversations not being updated’, ‘how to diagnose response.id never being created with store:true’, ‘how to send multiple methods of input and also multiple ways the API reference has of passing conversation ID’, then background threads to delay and persist in trying to retrieve a conversation ID contents and response ID, I can fully conclude:
The responses API is extremely broken and not suitable for use.
Let’s just have a nice chat with the AI:
Conversation created conv_68f118865dc481959fae7ef8049de6ae01a95a333d512910
[assistant:] Hi — I’m ChatGPT, an AI assistant built by OpenAI. I can help with writing, editing, coding, brainstorming, research summaries, math, language translation, explanations, planning, troubleshooting, and more. I work best with clear instructions and examples.
A few quick notes:
I don’t have real-time web access or personal memory across sessions unless you provide context.
I can generate code, drafts, and suggestions, but always review for accuracy, safety, and legal/compliance needs.
Tell me the goal, constraints, and any examples, and I’ll get to work.
What would you like help with right now?
–Usage-- in/cached: 24/0; out/reasoning:136/0
Prompt (or ‘exit’): tell me more about quick note item 1 - what would be needed for that? Response ID deleted: resp_01a95a333d5129100068f11886fc288195866bbf3504362e55
[assistant:]
I don’t have the quick note you’re referring to. Can you paste item 1 (or describe it) so I can give targeted details?
If helpful, when you share it I can outline:
Goals and success criteria
Required people/roles and skills
Tools, software and materials
Step-by-step tasks and timeline
Estimated cost and effort
Risks and dependencies
Quick checklist to get started
Tell me the actual item and how detailed you want the plan (high-level vs. task-level).
The AI doesn’t understand what I am talking about from the prior turn. The conversation ID is not updated with any new items after storing its prior state and checking again.
I switch to the ‘object’ type of conversation ID, and place the input as a full array of messages with ‘type:message’. Same thing, no conversation history, and also intermittent failures to even create a response ID (where I then try to delete this completely unwanted artifact in the background)
Conversation created conv_68f11eddfa5481978ea68321fe15eaea0de2f8b4132ea99b
[assistant:] Hi — I’m ChatGPT, an AI assistant built by OpenAI. I can:
Answer questions and explain things clearly
Help draft emails, essays, code, summaries, plans, and creative writing
Analyze images you upload and provide feedback
Solve technical problems, debug code, and generate examples
Translate and work in multiple languages
Quick notes and limits:
My knowledge goes up to June 2024; I can’t browse the web or access real-time info.
I don’t retain personal data between conversations unless you provide context within the chat.
I can be helpful with many tasks, but verify critical facts (medical, legal, financial) with a qualified professional.
How can I help you today?
–Usage-- in/cached: 24/0; out/reasoning:155/0
Prompt (or ‘exit’): [warn] delete_response resp_0de2f8b4132ea99b0068f11ede85048197bf6ca449d85cd7ee not deleted after 2 attempt(s): HTTP 404: Response with id 'resp_0de2f8b4132ea99b0068f11ede85048197bf6ca449d85cd7ee' not found.tell me how you would do item number three you list.
[assistant:] I don’t have the list you’re referring to. Could you either paste the list here or tell me what item three is?
If you want a quick template for how I’d explain doing “item three,” here’s the format I’ll use once I know it:
Goal: one-sentence description of the outcome.
Inputs needed: data, tools, permissions, or constraints.
Step-by-step actions: numbered, practical steps to complete it.
Time estimate: rough effort/time required.
Risks or pitfalls: what can go wrong and how to avoid it.
Deliverable: what I’d produce and how I’d present it.
Share item three (or the list) and I’ll fill that in. [warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 34/0; out/reasoning:217/64
Prompt (or ‘exit’): just repeat what you said before. [warn] delete_response resp_0de2f8b4132ea99b0068f11f0e717881978288d14ef728e844 not deleted after 2 attempt(s): HTTP 404: Response with id 'resp_0de2f8b4132ea99b0068f11f0e717881978288d14ef728e844' not found.
[assistant:]
A helpful AI that keeps its answers brief yet insightful. [warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 29/0; out/reasoning:17/0
Besides in the last turn the AI repeating back the developer message as “what it said before”, showing nothing calling the AI model “ChatGPT” yet it calling itself ChatGPT:
more failures: conversation ID never created to be deleted, you can see from the input token report that the conversation length is not growing, and asserting against the previous call to GET https://api.openai.com/v1/responses/{response_id}/input_items is identical, with 0 items stored.
Even if this endpoint did rely on persisting a response ID to later get a conversation, the response ID storage itself is broken. That is the same as reported days ago, with someone trying to reuse a response ID as the chat history mechanism:
Despite any “failure to delete” in this code, in all this experimentation, not a single response ID was persisted in the logs or leaking past my cleanup.
And if you think that any errors are because of deleting a response id before it can be consumed: this works just fine without streaming, and setting my deleter simply to return True and the AI still doesn’t know jack about prior turns and the input can become smaller:
Tell me which specific wording you meant and I’ll explain more. [warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 36/0; out/reasoning:218/64
Prompt (or ‘exit’): hi!
[assistant:] Hi — how can I help you today? [warn] Conversation items unchanged after streaming turn; item count remains 0.
–Usage-- in/cached: 24/0; out/reasoning:15/0
I got a few “seemed to persist” early in testing, but then I can’t even replicate that.
And yes, “store”: true is hard-coded right in the payload construction function, alongside alternate “input” and “conversation”.
I can only conclude that the low number of reports for this issue on either ‘conversations’ endpoint or ‘response id’ storage is that storing conversation server-side is for noobs and no experienced developer would do such a thing with their user data and rely on such stateful information being persisted by a party with an “off” button for an organization they think misbehaves or doesn’t prepay.
Then, secondly, that programming Responses streaming event handling with literally dozens of events to handle, and then documentation so poor that even trying to parse the “error” event is completely wrong, that only foolhardy experienced developers would undertake ever using responses.
Finally, that OpenAI gated streaming itself as something denied until you go through “ID Verification” that is a complete failure of an implementation besides a massive intrusion - obviously there is no “trust” needed for receiving streaming, nor to make an ‘agent’ or request a ‘reasoning summary’. ID verification is forced on these methods purely to force it on developers and reap whatever profit motivates sending users to “withpersona”.
That the API for this was launched in a broken state is here:
That it was temporarily working is here, with my assurance:
So please, Andrew @wilkes - explain what’s gone wrong.
650 lines of a 'basic' API chatbot for streaming Responses with Conversation
"""OpenAI Conversations + Responses: server-side chat memory demo
One Conversation per run; server keeps history across turns.
Default: streaming via SSE; prints deltas; usage on completion.
Toggle non-streaming with USE_STREAMING.
instructions=SYSTEM sent per request (Conversation minimal).
Payload built by build_responses_payload; caller sets stream.
Persisted response (store:true) deleted after completion.
Lockfile aids cleanup; Conversation deleted on exit.
----------------------------------------------------"""
import os
import sys
import httpx
MODEL: str = "gpt-5-mini"
MAX_OUTPUT_TOKENS: int = 10000
SYSTEM = f"""
A helpful AI that keeps its answers brief yet insightful.
"""
HEADERS: dict[str, str] = {
"Content-Type": "application/json",
"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY', '')}",
}
# Developer toggle: choose streaming vs non-streaming path
USE_STREAMING: bool = True
data:dict = {} # global for examination of non-stream response in REPL environment
conversation_items_state: dict | None = None # last fetched conversation items list for diagnostics
print("__doc__: "+__doc__ or "")
def create_conversation() -> str:
"""
Create a conversation containing one developer message that sets the tone.
If a 'conversation.lock' exists, delete that conversation first (best-effort),
then proceed. The new conversation id is written to 'conversation.lock'.
Returns the server-generated conversation ID.
"""
from pathlib import Path
lock_path = Path("conversation.lock")
# Best-effort cleanup of an orphan from a previous run
if lock_path.exists():
try:
prev_id = lock_path.read_text(encoding="utf-8").strip()
except Exception:
prev_id = ""
if prev_id:
delete_conversation(prev_id)
else:
try:
lock_path.unlink(missing_ok=True)
except Exception:
pass
payload = {"metadata": {"topic": "demo"}}
with httpx.Client(timeout=20) as client:
response = client.post(
"https://api.openai.com/v1/conversations",
headers=HEADERS,
json=payload,
)
response.raise_for_status()
conversation_id: str = response.json()["id"]
# Record the active conversation id for crash/restart cleanup
try:
lock_path.write_text(conversation_id, encoding="utf-8")
except Exception:
pass
print(f"Conversation created {conversation_id}")
return conversation_id
def delete_conversation(conversation_id: str) -> None:
"""
Delete the conversation so the demo doesn’t leave stray server objects.
On 2xx or 404, it's treated as success. Errors are logged to stderr.
Always removes 'conversation.lock'. If a module-level `conversation_id`
matches the deleted id, it is cleared to None.
"""
import sys
from pathlib import Path
lock_path = Path("conversation.lock")
cleared = False
try:
with httpx.Client(timeout=20) as client:
response = client.delete(
f"https://api.openai.com/v1/conversations/{conversation_id}",
headers=HEADERS,
)
try:
response.raise_for_status()
except httpx.HTTPStatusError as exc:
if exc.response is not None and exc.response.status_code == 404:
print(f"Conversation already deleted {conversation_id}")
cleared = True
else:
raise
else:
print(f"Conversation deleted {conversation_id}")
cleared = True
except Exception as exc:
print(f"[warn] Couldn’t delete {conversation_id}: {exc}", file=sys.stderr)
finally:
# Always remove the lock file
try:
lock_path.unlink(missing_ok=True)
except Exception:
pass
def get_conversation_items(conversation_id: str, limit: int = 100) -> dict | None:
"""
Retrieve up to `limit` most recent items from a conversation for diagnostics.
Returns the JSON object (dict) on success, or None on error.
"""
import sys
try:
with httpx.Client(timeout=20) as client:
resp = client.get(
f"https://api.openai.com/v1/conversations/{conversation_id}/items",
headers=HEADERS,
params={"limit": limit},
)
resp.raise_for_status()
return resp.json()
except httpx.HTTPStatusError as exc:
r = exc.response
request_id = r.headers.get("x-request-id") if r is not None else None
status = r.status_code if r is not None else "unknown"
print(
f"[warn] get_conversation_items HTTP {status} (x-request-id={request_id})",
file=sys.stderr,
)
try:
import json
msg = (json.loads(r.text).get("error", {}).get("message")) if r is not None else str(exc)
except Exception:
msg = r.text if r is not None else str(exc)
while "****" in msg:
msg = msg.replace("****", "***")
print(f"[warn] message: {msg}", file=sys.stderr)
return None
except httpx.RequestError as exc:
print(f"[warn] get_conversation_items request error: {exc}", file=sys.stderr)
return None
def delete_response(
response_id: str,
*,
retries: int = 1,
delay_first: float = 0.0,
retry_delay: float = 2.0,
timeout: float = 20.0,
) -> bool:
"""
Attempt to delete a persisted Responses API object.
Returns True on 2xx; warns once if no attempt succeeded.
Retries help with eventual consistency when store:true objects lag before deletion.
On non-2xx, extracts JSON error.message if present, otherwise prints status and raw text.
"""
import sys
import time
#return True # shut it off temporarily
if not response_id:
return False
if delay_first > 0:
try:
time.sleep(delay_first)
except Exception:
pass
url = f"https://api.openai.com/v1/responses/{response_id}"
last_code: int | None = None
last_msg: str | None = None
try:
with httpx.Client(timeout=timeout) as client:
attempts = retries + 1
for i in range(attempts):
try:
resp = client.delete(url, headers=HEADERS)
status = resp.status_code
if 200 <= status < 300:
print(f"Response ID deleted: {response_id}")
return True
else:
try:
msg = resp.json().get("error", {}).get("message") or resp.text
except Exception:
msg = resp.text
last_code = status
last_msg = msg
except httpx.RequestError as exc:
last_code = None
last_msg = str(exc)
if i < attempts - 1:
try:
time.sleep(retry_delay)
except Exception:
pass
except Exception as exc:
print(f"[warn] delete_response {response_id} unexpected error: {exc}", file=sys.stderr)
return False
# No attempt succeeded; issue one concise warning.
detail = (
f"HTTP {last_code}: {last_msg}" if last_code is not None else (last_msg or "request error")
)
print(
f"[warn] delete_response {response_id} not deleted after {retries + 1} attempt(s): {detail}",
file=sys.stderr,
)
return False
def schedule_delete_response(
response_id: str,
*,
delay: float = 3.0,
retries: int = 1,
retry_delay: float = 2.0,
) -> None:
"""
Fire-and-forget deletion scheduled in the background.
Sleeps 'delay' seconds, then calls delete_response() with retry behavior.
Emits a single warning later if deletion never succeeded.
"""
import threading
if not response_id:
return
def _worker() -> None:
delete_response(
response_id,
delay_first=delay,
retries=retries,
retry_delay=retry_delay,
)
t = threading.Thread(target=_worker, name=f"delete_response:{response_id}", daemon=True)
t.start()
def build_responses_payload(
conversation_id: str,
user_input: str | list[dict] | dict,
model: str,
max_out: int | None,
stream: bool,
*,
instructions: str = SYSTEM, # system prompt per call
temperature: float = 0.5, # only for non-reasoning
top_p: float = 0.9, # only for non-reasoning
reasoning_effort: str = "low", # only for reasoning
reasoning_summary: str | None = "auto", # only for reasoning
verbosity: str = "medium", # only for gpt-5 family
**kwargs, # passthrough for other valid fields
) -> dict[str, object]:
"""
Build a minimal Responses API request body for a chat with Conversations,
applying model-appropriate gating of sampling vs reasoning vs verbosity.
Model gates:
- is_gpt5 = model.startswith("gpt-5") and not model.startswith("gpt-5-chat")
- is_reasoning = is_gpt5 or model.startswith(("o3", "o4"))
* reasoning models receive a 'reasoning' block (no temperature/top_p).
* non-reasoning models receive temperature & top_p (no 'reasoning').
* gpt-5 family models receive text.verbosity; others do not.
* any extra kwargs are merged in at the end (developer-controlled).
"""
is_gpt5 = model.startswith("gpt-5") and not model.startswith("gpt-5-chat")
is_reasoning = is_gpt5 or model.startswith(("o3", "o4"))
body: dict[str, object] = {
"model": model,
#"conversation": conversation_id,
"conversation": {"id": conversation_id}, # alternate format also documented
"instructions": instructions,
#"input": user_input,
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": user_input},
]
}
],
"max_output_tokens": max_out,
"store": True,
"stream": stream,
"text": {"format": {"type": "text"}},
}
if is_gpt5:
body["text"]["verbosity"] = verbosity
if is_reasoning:
reasoning: dict[str, object] = {"effort": reasoning_effort}
if reasoning_summary is not None:
reasoning["summary"] = reasoning_summary
body["reasoning"] = reasoning
else:
# sampling knobs only for non-reasoning
body["temperature"] = temperature
body["top_p"] = top_p
# merge in any other valid Responses parameters (developer's responsibility)
body.update(kwargs)
return body
def non_stream_response(
conversation_id: str,
user_input: str,
model: str,
max_out: [int|None] = None, # Look at app's global at script top
) -> str:
"""
Send user_input as the next turn and return the assistant’s reply text.
The same conversation ID is reused, so the server retains memory.
"""
global data
payload = build_responses_payload(
conversation_id=conversation_id,
user_input=user_input,
model=model,
max_out=max_out,
stream=False,
)
try:
with httpx.Client(timeout=600) as client:
response = client.post(
"https://api.openai.com/v1/responses",
headers=HEADERS,
json=payload,
)
response.raise_for_status()
data = response.json()
delete_response(data.get("id"))
print(
f"--Usage-- in/cached: {data['usage']['input_tokens']}/"
f"{data['usage']['input_tokens_details']['cached_tokens']}; "
f"out/reasoning:{data['usage']['output_tokens']}/"
f"{data['usage']['output_tokens_details']['reasoning_tokens']}"
)
except httpx.HTTPStatusError as exc:
resp = exc.response # httpx attaches the response to the exception
request_id = resp.headers.get("x-request-id") if resp is not None else None
status = resp.status_code if resp is not None else "unknown"
print(f"header x-request-id: {request_id}\nHTTP status {status} error", file=sys.stderr)
try:
import json
err_text = (
(json.loads(resp.text).get("error", {}).get("message"))
if resp is not None else str(exc)
)
except Exception:
err_text = resp.text if resp is not None else str(exc)
while "****" in err_text:
err_text = err_text.replace("****", "***")
print(f"message: {err_text}", file=sys.stderr)
raise # propagate; execution cannot safely continue
except httpx.RequestError as exc:
print(f"Request error: {exc}", file=sys.stderr)
raise
reply_fragments: list[str] = [
chunk.get("text", "")
for event in data.get("output", [])
for chunk in event.get("content", [])
if chunk.get("type") == "output_text"
]
return "".join(reply_fragments).strip()
def iter_sse_events(resp) -> tuple[str, dict]:
"""
Minimal SSE iterator for httpx streamed responses.
Yields (event_type, payload_dict). Unknown or malformed data lines are skipped.
"""
import json
current_event: str | None = None
data_lines: list[str] = []
for raw_line in resp.iter_lines():
line = raw_line.strip()
if not line:
if current_event and data_lines:
blob = "\n".join(data_lines)
try:
payload = json.loads(blob)
except json.JSONDecodeError:
payload = None
if isinstance(payload, dict):
etype = payload.get("type") or current_event
if isinstance(etype, str):
yield etype, payload
current_event = None
data_lines.clear()
continue
if line.startswith("event:"):
current_event = line[len("event:"):].strip()
elif line.startswith("data:"):
data_lines.append(line[len("data:"):].strip())
else:
# Ignore id:, retry:, comments, etc.
pass
def _poll_conversation_items_until_changed(
conversation_id: str,
prev_snapshot: dict | None,
*,
limit: int = 100,
tries: int = 3,
sleep_s: float = 0.75,
) -> dict | None:
"""
Poll conversation items briefly after completion to avoid false 'unchanged' warnings.
Returns the latest items (or None if fetch failed). Stops early if a change is observed.
"""
import time
latest: dict | None = None
for attempt in range(tries):
latest = get_conversation_items(conversation_id, limit=limit)
if latest is None:
# Error fetching; do not keep hammering.
break
if prev_snapshot is None:
break
if latest != prev_snapshot:
break
if attempt < tries - 1:
time.sleep(sleep_s)
return latest
def handle_response_event(event_type: str, evt: dict, state: dict) -> None:
"""
Handle one parsed Responses API event. Only the basic text streaming path
is demonstrated; the structure is easy to extend for tool calls and more.
"""
import sys
if event_type == "response.created":
resp_obj = evt.get("response") or {}
rid = resp_obj.get("id")
if rid:
state["response_id"] = rid
return
if event_type == "response.output_text.delta":
delta = evt.get("delta", "")
if isinstance(delta, str) and delta:
print(delta, end="", flush=True)
state["assembled_text"].append(delta)
state["delta_chunk_count"] += 1
state["printed_any"] = True
return
if event_type == "response.completed":
resp_obj = evt.get("response") or {}
state["final_response"] = resp_obj
state["usage"] = resp_obj.get("usage") or {}
rid = resp_obj.get("id")
if rid:
state["response_id"] = rid
state["completed"] = True
return
# Anticipated extensions (not implemented here):
# - response.queued / response.in_progress
# - response.reasoning_summary_part.added / ...text.delta / ...part.done
# - response.tool_call.* and response.tool_result.*
# - response.file_search_call.*
# - response.error and *.error
if event_type == "response.error" or event_type.endswith(".error"):
msg = evt.get("error", {}).get("message") or evt.get("message") or "unknown error"
print(f"\n[stream-error] {event_type}: {msg}", file=sys.stderr)
state["completed"] = True
return
# Drop other events silently per demo scope.
return
def stream_response(
conversation_id: str,
user_input: str,
model: str,
max_out: [int|None] = None,
) -> None:
"""
Stream the next assistant turn. Prints text as response.output_text.delta arrives.
After streaming, briefly polls for conversation item updates to reduce false warnings.
Schedules persisted response deletion after a short delay, and reports final usage.
"""
global data, conversation_items_state
payload = build_responses_payload(
conversation_id=conversation_id,
user_input=user_input,
model=model,
max_out=max_out,
stream=True,
)
state: dict[str, object] = {
"assembled_text": [],
"delta_chunk_count": 0,
"printed_any": False,
"response_id": None,
"usage": None,
"final_response": None,
"completed": False,
}
try:
with httpx.Client(timeout=600) as client:
# Be explicit about SSE
sse_headers = {**HEADERS, "Accept": "text/event-stream"}
with client.stream(
"POST",
"https://api.openai.com/v1/responses",
headers=sse_headers,
json=payload,
) as resp:
resp.raise_for_status()
for event_type, evt in iter_sse_events(resp):
handle_response_event(event_type, evt, state)
if state["completed"]:
break
# Ensure the next prompt starts on a new line
joined = "".join(state["assembled_text"])
if state["printed_any"] and not joined.endswith("\n"):
print()
# Briefly poll for conversation updates to avoid false "unchanged" warnings.
latest_items = _poll_conversation_items_until_changed(
conversation_id,
conversation_items_state,
limit=100,
tries=3,
sleep_s=0.75,
)
if latest_items is not None:
if conversation_items_state is not None and latest_items == conversation_items_state:
import sys as _sys
prev_count = len(conversation_items_state.get("data", [])) if isinstance(conversation_items_state, dict) else 0
print(
f"[warn] Conversation items unchanged after streaming turn; item count remains {prev_count}.",
file=_sys.stderr,
)
conversation_items_state = latest_items
# Schedule deletion to happen shortly after completion.
rid = state.get("response_id")
if isinstance(rid, str) and rid:
schedule_delete_response(rid, delay=5.0, retries=1, retry_delay=2.0)
# Report usage
usage = state["usage"] or {}
in_tokens = usage.get("input_tokens", 0)
in_cached = (usage.get("input_tokens_details") or {}).get("cached_tokens", 0)
out_tokens = usage.get("output_tokens", 0)
reasoning_tokens = (usage.get("output_tokens_details") or {}).get("reasoning_tokens", 0)
print(
f"--Usage-- in/cached: {in_tokens}/{in_cached}; "
f"out/reasoning:{out_tokens}/{reasoning_tokens}"
)
data = state["final_response"] or {}
except httpx.HTTPStatusError as exc:
resp = exc.response
request_id = resp.headers.get("x-request-id") if resp is not None else None
status = resp.status_code if resp is not None else "unknown"
print(f"header x-request-id: {request_id}\nHTTP status {status} error", file=sys.stderr)
try:
import json as _json
err_text = (
(_json.loads(resp.text).get("error", {}).get("message"))
if resp is not None
else str(exc)
)
except Exception:
err_text = resp.text if resp is not None else str(exc)
while "****" in err_text:
err_text = err_text.replace("****", "***")
print(f"message: {err_text}", file=sys.stderr)
raise
except httpx.RequestError as exc:
print(f"Request error: {exc}", file=sys.stderr)
raise
def main() -> None:
conversation_id: str | None = create_conversation()
prompt: str = "Introduce yourself"
try:
for _ in range(20):
if USE_STREAMING:
# Streamed path: prints as tokens arrive and reports usage internally.
print("\n[assistant:] ", end="", flush=True)
stream_response(
conversation_id,
prompt,
model=MODEL,
max_out=MAX_OUTPUT_TOKENS,
)
else:
# Non-streamed path: returns the full assistant reply string.
assistant_reply: str = non_stream_response(
conversation_id,
prompt,
model=MODEL,
max_out=MAX_OUTPUT_TOKENS,
)
print(f"\n-[assistant:] {assistant_reply}")
prompt = input("\nPrompt (or 'exit'): ").strip()
if prompt.lower() == "exit":
break
except KeyboardInterrupt:
print("\n[ctrl-c] Exiting…")
finally:
delete_conversation(conversation_id)
if __name__ == "__main__":
main()