Hello,
I am using Assistants API v1 together with the gpt-4.1-mini-2025-04-14 model in streaming mode, and I’m encountering a recurring issue: the JSON response generated by the assistant is cut off before completion.
Context
-
Backend: C# / .NET 8
-
Using Assistants API v1 (Assistants, Threads, Messages, Runs)
-
Files involved:
-
Teams transcript (.docx)
-
Participants list (CSV)
-
-
Using file-based search (
file_search) -
The assistant must output a long, fully-structured JSON
-
Streaming enabled (
stream=true)
Issue
Randomly, the streamed response:
-
stops before completion,
-
returns an incomplete JSON,
-
ends in the middle of a string,
-
triggers a deserialization error on the C# side.
Typical example:
"digital twi
Sometimes I also get:
Internal Server Error: The response ended prematurely. (ResponseEnded)
This happens only in streaming mode.
Observations
-
max_tokensis set tonull -
JSON response enforced with
response_format: { type: "json_object" } -
In non-streaming mode, responses are always complete
-
Files are correctly uploaded and indexed
-
Transcript is about 40–50 minutes long, well within model limits
-
The issue occurs intermittently
My question
Is this a known issue with Assistants API v1 when streaming responses?
Is there a recommended workaround, configuration, or best practice to guarantee full JSON completion when using gpt-4.1-mini-2025-04-14?
Thank you for your help.