EOF Error in Streaming in Structured Outputs

arpit.babbar · October 8, 2024, 1:57pm

Hello Guys, I am using structured output of OpenAI to get response instructured way, and I am further I am using streaming to improve the response and send faster chunks response to client.

with client.beta.chat.completions.stream(
model=“gpt-4o-mini-2024-07-18”,
messages=user_messages,
response_format=ViniReply, stream_options={“include_usage”: True}
) as stream:

this is my client initialization part, ViniReply is mt structured response class.

and I am processing chunks as

for chunk in stream:

rest logic - mainly to parse json I’m doing like

                            if chunk.type == 'chunk':

                        # chunk_dict = chunk.to_dict()
                        # latest_snapshot = chunk_dict['snapshot']
                        # The first chunk doesn't have the 'parsed' key, so using .get to prevent raising an exception
                        # latest_parsed = latest_snapshot['choices'][0]['message'].get('parsed', {})
                        # # Note that usage is not available until the final chunk
                        # latest_usage = latest_snapshot.get('usage', {})
                        # latest_json = latest_snapshot['choices'][0]['message']['content']

But sometimes I am getting EOF exception while parsing the chunk in stream.
I recieve two chunks -
ChunkEvent(type=‘chunk’, chunk=ChatCompletionChunk(id=‘chatcmpl-AG4GJodbMopAZklL13lYvsDZn7lfu’, choices=[Choice(delta=ChoiceDelta(content=‘’, function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1728392995, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion.chunk’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None), snapshot=ParsedChatCompletion[object](id=‘chatcmpl-AG4GJodbMopAZklL13lYvsDZn7lfu’, choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content=‘’, refusal=None, role=‘assistant’, function_call=None, tool_calls=None, parsed=None))], created=1728392995, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None))

and then,

ChunkEvent(type=‘chunk’, chunk=ChatCompletionChunk(id=‘chatcmpl-AG4OfJriuKOS9AA8rCGRfkL1Hvcb4’, choices=[Choice(delta=ChoiceDelta(content=‘’, function_call=None, refusal=None, role=‘assistant’, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1728393513, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion.chunk’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None), snapshot=ParsedChatCompletion[object](id=‘chatcmpl-AG4OfJriuKOS9AA8rCGRfkL1Hvcb4’, choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content=‘’, refusal=None, role=‘assistant’, function_call=None, tool_calls=None, parsed=None))], created=1728393513, model=‘gpt-4o-mini-2024-07-18’, object=‘chat.completion’, service_tier=None, system_fingerprint=‘fp_f85bea6784’, usage=None))

after this I am getting this exception -

Exception in call_vini - Traceback (most recent call last):
File “/Users/arpit/Documents/VIRTUAL-FRIEND/sakura_va/service/structured_output_openai_service.py”, line 53, in call_vini
for chunk in stream:
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 72, in iter
for item in self._iterator:
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 116, in stream
events_to_fire = self._state.handle_chunk(sse_event)
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 317, in handle_chunk
self.__current_completion_snapshot = self._accumulate_chunk(chunk)
File “/Users/arpit/miniconda3/envs/myenv/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py”, line 407, in _accumulate_chunk
choice_snapshot.message.parsed = from_json(
ValueError: EOF while parsing a value at line 2 column 0

_j · October 8, 2024, 2:24pm

Are you running out your max_tokens setting?

A JSON must be completed and closed in order for it to be valid.

Have you described the use of your JSON output so well that it could be produced regardless of the response format being sent?

The model can go into a loop of garbage in strings.

Are you using default temperature?

mini doesn’t produce high certainty what to write; improve that with temperature or top_p set to 0.5 or so.

Tried logprobs?

if you can capture that before the error, or using another method to get the same output, you can see if the AI is truly sending 00f bytes.

The error thrown is from the SDK parsing at the end of this library snippet, which is brute-forcing its way through a bunch of "if"s to get there.

    def _accumulate_chunk(self, chunk: ChatCompletionChunk) -> ParsedChatCompletionSnapshot:
        completion_snapshot = self.__current_completion_snapshot

        if completion_snapshot is None:
            return _convert_initial_chunk_into_snapshot(chunk)

        for choice in chunk.choices:
            try:
                choice_snapshot = completion_snapshot.choices[choice.index]
                previous_tool_calls = choice_snapshot.message.tool_calls or []

                choice_snapshot.message = cast(
                    ParsedChatCompletionMessageSnapshot,
                    construct_type(
                        type_=ParsedChatCompletionMessageSnapshot,
                        value=accumulate_delta(
                            cast(
                                "dict[object, object]",
                                model_dump(
                                    choice_snapshot.message,
                                    # we don't want to serialise / deserialise our custom properties
                                    # as they won't appear in the delta and we don't want to have to
                                    # continuosly reparse the content
                                    exclude={
                                        "parsed": True,
                                        "tool_calls": {
                                            idx: {"function": {"parsed_arguments": True}}
                                            for idx, _ in enumerate(choice_snapshot.message.tool_calls or [])
                                        },
                                    },
                                ),
                            ),
                            cast("dict[object, object]", choice.delta.to_dict()),
                        ),
                    ),
                )

                # ensure tools that have already been parsed are added back into the newly
                # constructed message snapshot
                for tool_index, prev_tool in enumerate(previous_tool_calls):
                    new_tool = (choice_snapshot.message.tool_calls or [])[tool_index]

                    if prev_tool.type == "function":
                        assert new_tool.type == "function"
                        new_tool.function.parsed_arguments = prev_tool.function.parsed_arguments
                    elif TYPE_CHECKING:  # type: ignore[unreachable]
                        assert_never(prev_tool)
            except IndexError:
                choice_snapshot = cast(
                    ParsedChoiceSnapshot,
                    construct_type(
                        type_=ParsedChoiceSnapshot,
                        value={
                            **choice.model_dump(exclude_unset=True, exclude={"delta"}),
                            "message": choice.delta.to_dict(),
                        },
                    ),
                )
                completion_snapshot.choices.append(choice_snapshot)

            if choice.finish_reason:
                choice_snapshot.finish_reason = choice.finish_reason

                if has_parseable_input(response_format=self._response_format, input_tools=self._input_tools):
                    if choice.finish_reason == "length":
                        # at the time of writing, `.usage` will always be `None` but
                        # we include it here in case that is changed in the future
                        raise LengthFinishReasonError(completion=completion_snapshot)

                    if choice.finish_reason == "content_filter":
                        raise ContentFilterFinishReasonError()

            if (
                choice_snapshot.message.content
                and not choice_snapshot.message.refusal
                and is_given(self._rich_response_format)
            ):
                choice_snapshot.message.parsed = from_json(
                    bytes(choice_snapshot.message.content, "utf-8"),
                    partial_mode=True,
                )

arpit.babbar · October 8, 2024, 3:34pm

Thanks for reply. I dont think I am running out os json tokens, because this is starting of my response and it is working fine in some cases, it is giving this error very inconsistently. I am trying to fine a root cause, why this is happening, my prompt is same, structure is always same. Sometimes it’s working fine but sometimes giving EOF after two chunks - 1. chunk for chunk.type == “chunk” and 2nd chunk chunk.type==“delta.content”.

And yes, I’ve explained in the prompt what structured output I need and why I need it, so I dont think there might be nay issue there.

Will try with temperature and logprob, I was using with default parameters as of now.

arpit.babbar · October 8, 2024, 5:59pm

Tried logprobs, nope nothing there.
logprobs=ChoiceLogprobs(content=, refusal=None))]

and tried with various temperature also, still very random but getting more frequently now.

_j · October 8, 2024, 6:49pm

If you set top_p: 0.001, there should be little variety in the output when sending the same inputs. That will let you discover and reproduce exact inputs that produce the issues.

If you cannot easily get this near-deterministic setting to reproduce the fault so that it can be reported - well, then you’ve also solved your problem.

Structured output enforces the formation of the JSON, but the problem is that the contents of strings is still unbound, and it has already been reported, especially using mini, that it can go into similar loops as json_object response format, where the strings are filled with tabs or newlines.

This could be a case where the input is provoking the AI to say “I’m done” and it just outputs a stop token, ending the response, while within JSON.

Disappointing if logprobs are turned off in this mode also, as they are turned off when functions are being produced.

arpit.babbar · October 8, 2024, 7:13pm

Yes I’ve been trying this only changing top_p params and changing model. Tried with top_p: 0.001 still got the error. One more thing which I have noticed is, I am using model=‘gpt-4o-2024-08-06’ as my gpt model. Here I am getting Exception but after few conversations. Like while using mini I was getting exception after like 3-4 convos, but here its about 7-8. This might not be related but still thought about sharing it if this might help?

_j · October 8, 2024, 7:37pm

You can send the same input, but not use the beta parsing of the output by the SDK. See if your own iterative generator handling can do better.

Another option is to drop chunks that cause the JSON function to error. The success there will only be if there is a brief anomaly and not a continuing corruption of string garbage that never closes the string or JSON (frequency_penalty might help it get out, but not to get a good response).

Here’s modifying the erroring OpenAI Python library at line 400+ file (where your traceback shows).


            # Ensure that the content is valid UTF-8 and handle possible JSON decode errors
            try:
                content_bytes = bytes(choice_snapshot.message.content, "utf-8")
                choice_snapshot.message.parsed = from_json(
                    content_bytes,
                    partial_mode=True
                )
            except UnicodeDecodeError:
                # Handle or log the decode error
                pass
            except json.JSONDecodeError as e:
                # Handle or log incomplete JSON data
                pass

Capping max_tokens at the maximum you would ever get would be a good idea, so you aren’t paying for 16k of garbage out of an AI model.

Another theory that is quite plausible: the schema of how to output is simply lost and forgotten when there’s lots of chat history. Then the grammar enforcement and json output training of unstoppable whitespace takes over.

arpit.babbar · October 8, 2024, 8:02pm

I will try this manual parsing, thanks for sharing and will update on this thread later.
But for the time being, I’ve reduced the size of conversation history arrat that I was keeping. I’ve reduced the size to keep last 10 convos only + 1 system message in the array and I guess its working fine now. Not sure though still need lot of testing, its just a workaround. Will keep updating here, If I find anything else.

Thanks for the support though.

dominolex14 · November 3, 2024, 7:48pm

any updates here? I have the same problem. Firstly it happened once or maximum 2 times for a week but now it’s appeared again and each request to open ai fails with that error

arpit.babbar · November 4, 2024, 6:30am

Hey! I haven’t tried this manual parsing thing. I’ve done some changes in prompt by giving examples and some more explanations and maintaining proper chat history (if you’re using it in chat, if it’s one time thing then I don’t think that you need it! By doing all this the error is minimized to a greater extent. But it’s not perfect yet!

megerpascal · February 2, 2025, 11:44pm

Same problem here. Is there any solution?

arpit.babbar · February 4, 2025, 7:29am

Are you using structured output in chat or single request-response?

megerpascal · February 4, 2025, 3:29pm

I use structured outputs with the streaming function.
I reduced the user messages I sent to the API and I do not receive this error message anymore. Maybe I reached the maximal token input. But I did not expect it with around 10 user messages which only includes about one sentence. And I expected a different error message.
Maybe it was a different problem and I only reduced the probability of occurrence by reducing the messages.

arpit.babbar · February 4, 2025, 4:55pm

Yes, I am also doing same kinda thing in my use case. By reducing the message array and keeping complete json (which I got from response) as the value of assistant message in message array helped my reduce the probability and by defining good and short prompts. I won’t say this solution is perfect but it has reduced my error probability up to great extent (hardly 5-10 error in last 3 months) and whenevr I got any error I handled it gracefully (worked well in my case).

What other possible solutions are you looking into?

megerpascal · February 4, 2025, 9:26pm

I will evaluate the topic in some week when I know how often it occurs. Maybe add some error handling and let the method run again.

Topic		Replies	Views
Json format causes infinite "\n \n \n \n" in response API gpt-4 , api , json-mode	20	8769	February 21, 2025
Structured output calls fail trying to parse response content Bugs structured-output	20	1719	March 4, 2025
Structured Outputs not reliable with GPT-4o-mini and GPT-4o API structured-output	38	5767	January 23, 2025
4o and 4 API output has typo/missing words Bugs gpt-4	55	666	July 19, 2024
Streaming events returned bunched up Bugs	8	189	July 19, 2024

EOF Error in Streaming in Structured Outputs

rest logic - mainly to parse json I’m doing like

Related topics