EOF Error in Streaming in Structured Outputs

Hello Guys, I am using structured output of OpenAI to get response instructured way, and I am further I am using streaming to improve the response and send faster chunks response to client.

with client.beta.chat.completions.stream(
model=“gpt-4o-mini-2024-07-18”,
messages=user_messages,
response_format=ViniReply, stream_options={“include_usage”: True}
) as stream:

this is my client initialization part, ViniReply is mt structured response class.

and I am processing chunks as

for chunk in stream:

rest logic - mainly to parse json I’m doing like

                            if chunk.type == 'chunk':

                        # chunk_dict = chunk.to_dict()
                        # latest_snapshot = chunk_dict['snapshot']
                        # The first chunk doesn't have the 'parsed' key, so using .get to prevent raising an exception
                        # latest_parsed = latest_snapshot['choices'][0]['message'].get('parsed', {})
                        # # Note that usage is not available until the final chunk
                        # latest_usage = latest_snapshot.get('usage', {})
                        # latest_json = latest_snapshot['choices'][0]['message']['content']

But sometimes I am getting EOF exception while parsing the chunk in stream.
I recieve two chunks -
ChunkEvent(type=‘chunk’, chunk=ChatCompletionChunk(id=‘chatcmpl-AG4GJodbMopAZklL13lYvsDZn7lfu’, choices=[Choice(delta=ChoiceDelta(content=‘’, function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1728392995, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion.chunk’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None), snapshot=ParsedChatCompletion[object](id=‘chatcmpl-AG4GJodbMopAZklL13lYvsDZn7lfu’, choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content=‘’, refusal=None, role=‘assistant’, function_call=None, tool_calls=None, parsed=None))], created=1728392995, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None))

and then,

ChunkEvent(type=‘chunk’, chunk=ChatCompletionChunk(id=‘chatcmpl-AG4OfJriuKOS9AA8rCGRfkL1Hvcb4’, choices=[Choice(delta=ChoiceDelta(content=‘’, function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1728393513, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion.chunk’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None), snapshot=ParsedChatCompletion[object](id=‘chatcmpl-AG4OfJriuKOS9AA8rCGRfkL1Hvcb4’, choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content=‘’, refusal=None, role=‘assistant’, function_call=None, tool_calls=None, parsed=None))], created=1728393513, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None))

after this I am getting this exception -

Exception in call_vini - Traceback (most recent call last):
File “/Users/arpit/Documents/VIRTUAL-FRIEND/sakura_va/service/structured_output_openai_service.py”, line 53, in call_vini
for chunk in stream:
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 72, in iter
for item in self._iterator:
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 116, in stream
events_to_fire = self._state.handle_chunk(sse_event)
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 317, in handle_chunk
self.__current_completion_snapshot = self._accumulate_chunk(chunk)
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 407, in _accumulate_chunk
choice_snapshot.message.parsed = from_json(
ValueError: EOF while parsing a value at line 2 column 0

1 Like
  • Are you running out your max_tokens setting?

A JSON must be completed and closed in order for it to be valid.

  • Have you described the use of your JSON output so well that it could be produced regardless of the response format being sent?

The model can go into a loop of garbage in strings.

  • Are you using default temperature?

mini doesn’t produce high certainty what to write; improve that with temperature or top_p set to 0.5 or so.

  • Tried logprobs?

if you can capture that before the error, or using another method to get the same output, you can see if the AI is truly sending 00f bytes.

The error thrown is from the SDK parsing at the end of this library snippet, which is brute-forcing its way through a bunch of "if"s to get there.

    def _accumulate_chunk(self, chunk: ChatCompletionChunk) -> ParsedChatCompletionSnapshot:
        completion_snapshot = self.__current_completion_snapshot

        if completion_snapshot is None:
            return _convert_initial_chunk_into_snapshot(chunk)

        for choice in chunk.choices:
            try:
                choice_snapshot = completion_snapshot.choices[choice.index]
                previous_tool_calls = choice_snapshot.message.tool_calls or []

                choice_snapshot.message = cast(
                    ParsedChatCompletionMessageSnapshot,
                    construct_type(
                        type_=ParsedChatCompletionMessageSnapshot,
                        value=accumulate_delta(
                            cast(
                                "dict[object, object]",
                                model_dump(
                                    choice_snapshot.message,
                                    # we don't want to serialise / deserialise our custom properties
                                    # as they won't appear in the delta and we don't want to have to
                                    # continuosly reparse the content
                                    exclude={
                                        "parsed": True,
                                        "tool_calls": {
                                            idx: {"function": {"parsed_arguments": True}}
                                            for idx, _ in enumerate(choice_snapshot.message.tool_calls or [])
                                        },
                                    },
                                ),
                            ),
                            cast("dict[object, object]", choice.delta.to_dict()),
                        ),
                    ),
                )

                # ensure tools that have already been parsed are added back into the newly
                # constructed message snapshot
                for tool_index, prev_tool in enumerate(previous_tool_calls):
                    new_tool = (choice_snapshot.message.tool_calls or [])[tool_index]

                    if prev_tool.type == "function":
                        assert new_tool.type == "function"
                        new_tool.function.parsed_arguments = prev_tool.function.parsed_arguments
                    elif TYPE_CHECKING:  # type: ignore[unreachable]
                        assert_never(prev_tool)
            except IndexError:
                choice_snapshot = cast(
                    ParsedChoiceSnapshot,
                    construct_type(
                        type_=ParsedChoiceSnapshot,
                        value={
                            **choice.model_dump(exclude_unset=True, exclude={"delta"}),
                            "message": choice.delta.to_dict(),
                        },
                    ),
                )
                completion_snapshot.choices.append(choice_snapshot)

            if choice.finish_reason:
                choice_snapshot.finish_reason = choice.finish_reason

                if has_parseable_input(response_format=self._response_format, input_tools=self._input_tools):
                    if choice.finish_reason == "length":
                        # at the time of writing, `.usage` will always be `None` but
                        # we include it here in case that is changed in the future
                        raise LengthFinishReasonError(completion=completion_snapshot)

                    if choice.finish_reason == "content_filter":
                        raise ContentFilterFinishReasonError()

            if (
                choice_snapshot.message.content
                and not choice_snapshot.message.refusal
                and is_given(self._rich_response_format)
            ):
                choice_snapshot.message.parsed = from_json(
                    bytes(choice_snapshot.message.content, "utf-8"),
                    partial_mode=True,
                )
1 Like

Thanks for reply. I dont think I am running out os json tokens, because this is starting of my response and it is working fine in some cases, it is giving this error very inconsistently. I am trying to fine a root cause, why this is happening, my prompt is same, structure is always same. Sometimes it’s working fine but sometimes giving EOF after two chunks - 1. chunk for chunk.type == “chunk” and 2nd chunk chunk.type==“delta.content”.

And yes, I’ve explained in the prompt what structured output I need and why I need it, so I dont think there might be nay issue there.

Will try with temperature and logprob, I was using with default parameters as of now.

Tried logprobs, nope nothing there.
logprobs=ChoiceLogprobs(content=, refusal=None))]

and tried with various temperature also, still very random but getting more frequently now.

If you set top_p: 0.001, there should be little variety in the output when sending the same inputs. That will let you discover and reproduce exact inputs that produce the issues.

If you cannot easily get this near-deterministic setting to reproduce the fault so that it can be reported - well, then you’ve also solved your problem.

Structured output enforces the formation of the JSON, but the problem is that the contents of strings is still unbound, and it has already been reported, especially using mini, that it can go into similar loops as json_object response format, where the strings are filled with tabs or newlines.

This could be a case where the input is provoking the AI to say “I’m done” and it just outputs a stop token, ending the response, while within JSON.

Disappointing if logprobs are turned off in this mode also, as they are turned off when functions are being produced.

Yes I’ve been trying this only changing top_p params and changing model. Tried with top_p: 0.001 still got the error. One more thing which I have noticed is, I am using model=‘gpt-4o-2024-08-06’ as my gpt model. Here I am getting Exception but after few conversations. Like while using mini I was getting exception after like 3-4 convos, but here its about 7-8. This might not be related but still thought about sharing it if this might help?

You can send the same input, but not use the beta parsing of the output by the SDK. See if your own iterative generator handling can do better.

Another option is to drop chunks that cause the JSON function to error. The success there will only be if there is a brief anomaly and not a continuing corruption of string garbage that never closes the string or JSON (frequency_penalty might help it get out, but not to get a good response).

Here’s modifying the erroring OpenAI Python library at line 400+ file (where your traceback shows).


            # Ensure that the content is valid UTF-8 and handle possible JSON decode errors
            try:
                content_bytes = bytes(choice_snapshot.message.content, "utf-8")
                choice_snapshot.message.parsed = from_json(
                    content_bytes,
                    partial_mode=True
                )
            except UnicodeDecodeError:
                # Handle or log the decode error
                pass
            except json.JSONDecodeError as e:
                # Handle or log incomplete JSON data
                pass

Capping max_tokens at the maximum you would ever get would be a good idea, so you aren’t paying for 16k of garbage out of an AI model.

Another theory that is quite plausible: the schema of how to output is simply lost and forgotten when there’s lots of chat history. Then the grammar enforcement and json output training of unstoppable whitespace takes over.

I will try this manual parsing, thanks for sharing and will update on this thread later.
But for the time being, I’ve reduced the size of conversation history arrat that I was keeping. I’ve reduced the size to keep last 10 convos only + 1 system message in the array and I guess its working fine now. Not sure though still need lot of testing, its just a workaround. Will keep updating here, If I find anything else.

Thanks for the support though.

any updates here? I have the same problem. Firstly it happened once or maximum 2 times for a week but now it’s appeared again and each request to open ai fails with that error

Hey! I haven’t tried this manual parsing thing. I’ve done some changes in prompt by giving examples and some more explanations and maintaining proper chat history (if you’re using it in chat, if it’s one time thing then I don’t think that you need it! By doing all this the error is minimized to a greater extent. But it’s not perfect yet!