Streaming events returned bunched up

Tokens are returned from the chat completions API in big, choppy batches.
They used to be smoothly returned one by one.

See this example

import time
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "what is the meaning of life?"}],
    stream=True,
)

start = time.time()
for event in response:
    content = event.choices[0].delta.content
    if content:
        content = content.replace("\n", "\\n")
    print(f"{time.time() - start:.2f}: {content}")

That prints

0.00: 
0.00: The
0.00:  question
0.00:  of
0.00:  the
0.00:  meaning
0.00:  of
0.00:  life
0.00:  is
0.00:  a
0.00:  profound
0.00:  and
0.00:  age
0.24: -old
0.24:  one
0.24:  that
0.24:  has
0.24:  fascinated
0.24:  humans
0.24:  for
0.24:  centuries
0.24: .
0.24:  It's
0.24:  often
0.24:  addressed
0.25:  through
0.25:  various
0.25:  lenses
0.48: ,
0.48:  including
0.48:  philosophy
0.48: ,
0.48:  religion
0.48: ,
0.48:  science
0.48: ,
0.48:  and
0.48:  personal
0.48:  reflection
0.49: .
0.49:  Here
0.49:  are
0.49:  a
0.49:  few
0.76:  perspectives
0.76: :\n\n
0.76: 1
0.76: .
0.76:  **
0.76: Ph
0.76: ilos
0.76: oph
0.76: ical
0.76:  Perspectives
0.76: **
0.76: :\n
0.76:   
0.76:  -
0.76:  **
1.07: Exist
1.08: ential
1.08: ism
1.08: **
1.08: :
1.08:  This
1.08:  philosophy
1.08:  suggests
1.08:  that
1.08:  life
1.08:  has
1.08:  no
1.08:  inherent
1.08:  meaning
1.08: ,
1.08:  and
1.32:  it
1.32:  is
1.32:  up
1.32:  to
1.32:  each
1.32:  individual
1.32:  to
1.32:  create
1.32:  their
1.32:  own
1.32:  meaning
1.32:  through
1.32:  choices
1.32:  and
1.32:  actions
1.66: .
1.66:  Not
1.66: able
1.66:  existential
1.66: ists
1.66:  include
1.66:  Jean
1.66: -Paul
1.66:  Sart
1.66: re
1.66:  and
1.66:  Albert
1.66:  Cam
1.66: us
1.66: .\n
1.66:   
1.92:  -
1.92:  **
1.92: Abs
1.93: urd
1.93: ism
1.93: **
1.93: :
1.93:  Popular
1.93: ized
1.93:  by
1.93:  Albert
1.93:  Cam
1.93: us
1.93: ,
1.93:  this
1.93:  view
2.26:  holds
2.26:  that
2.26:  the
2.26:  search
2.26:  for
2.26:  meaning
2.26:  is
2.26:  inherently
2.26:  in
2.26:  conflict
2.26:  with
2.26:  the
2.26:  silent
2.26: ,
2.26:  indifferent
2.51:  universe
2.51: .
2.51:  Life
2.51: 's
2.51:  meaning
2.51: ,
2.51:  therefore
2.51: ,
2.51:  is
2.51:  what
2.51:  we
2.51:  give
2.52:  it
2.52: ,
2.52:  despite
2.52:  its
2.80:  inherent
2.80:  meaning
2.80: lessness
2.80: .\n\n
2.80: 2
2.80: .
2.80:  **
2.80: Rel
2.80: igious
2.80:  Perspectives
2.80: **
2.80: :\n
2.80:   
2.80:  -
2.81:  Many
3.16:  religions
3.16:  offer
3.16:  specific
3.16:  interpretations
3.16:  of
3.16:  life's
3.16:  meaning
3.16: .
3.16:  For
3.16:  instance
3.17: ,
3.17:  in
3.17:  Christianity
3.17: ,
3.17:  life's
3.17:  purpose
3.56:  is
3.56:  often
3.56:  seen
3.56:  as
3.56:  serving
3.56:  God
3.56:  and
3.56:  following
3.56:  His
3.56:  will
3.56: .
3.56:  In
3.56:  Hindu
3.56: ism
3.56: ,
3.84:  life's
3.84:  meaning
3.84:  might
3.84:  be
3.84:  understood
3.84:  in
3.84:  terms
3.84:  of
3.84:  karma
3.84:  and
3.84:  achieving
3.84:  Mok
3.84: sha
3.84:  (
3.84: lib
3.84: er
4.08: ation
4.08:  from
4.08:  the
4.08:  cycle
4.09:  of
4.09:  reb
4.09: irth
4.09: ).\n\n
4.09: 3
4.09: .
4.09:  **
4.09: Scientific
4.09:  Perspectives
4.09: **
4.09: :\n
4.37:   
4.37:  -
4.37:  From
4.37:  a
4.37:  scientific
4.37:  standpoint
4.37: ,
4.37:  the
4.37:  meaning
4.37:  of
4.37:  life
4.37:  can
4.37:  be
4.37:  approached
4.37:  in
4.38:  terms
4.68:  of
4.68:  biological
4.68:  purposes
4.68:  such
4.68:  as
4.68:  survival
4.68: ,
4.68:  reproduction
4.68: ,
4.68:  and
4.68:  the
4.68:  continuation
4.68:  of
4.68:  genes
4.68: .
4.96:  Evolution
4.96: ary
4.96:  biology
4.96:  often
4.96:  frames
4.96:  life's
4.96:  meaning
4.96:  around
4.96:  the
4.96:  perpet
4.96: uation
4.96:  of
4.96:  species
4.96: .\n\n
4.96: 4
4.96: .
5.32:  **
5.32: Personal
5.32:  Perspectives
5.32: **
5.32: :\n
5.32:   
5.32:  -
5.32:  On
5.32:  a
5.32:  personal
5.32:  level
5.32: ,
5.32:  people
5.32:  often
5.32:  find
5.61:  meaning
5.61:  through
5.61:  connections
5.61:  with
5.61:  others
5.61: ,
5.61:  achieving
5.61:  personal
5.61:  goals
5.61: ,
5.61:  pursuing
5.61:  passions
5.61: ,
5.61:  and
5.61:  experiencing
5.61:  love
5.88:  and
5.88:  happiness
5.88: .\n\n
5.88: 5
5.88: .
5.88:  **
5.89: Psych
5.89: ological
5.89:  Perspectives
5.89: **
5.89: :\n
5.89:   
5.89:  -
5.89:  Psychological
5.89:  theories
6.14: ,
6.14:  such
6.14:  as
6.14:  those
6.14:  proposed
6.14:  by
6.14:  Victor
6.14:  Frank
6.14: l
6.14:  in
6.15:  "
6.15: Man
6.15: 's
6.15:  Search
6.15:  for
6.15:  Meaning
6.51: ,"
6.51:  suggest
6.51:  that
6.51:  finding
6.51:  purpose
6.51:  and
6.51:  personal
6.52:  significance
6.52:  is
6.52:  crucial
6.52:  for
6.52:  mental
6.52:  health
6.52:  and
6.52:  well
6.89: -being
6.89: .\n\n
6.89: Ultimately
6.89: ,
6.89:  the
6.89:  meaning
6.89:  of
6.89:  life
6.89:  is
6.89:  a
6.89:  deeply
6.89:  personal
6.90:  concept
6.90: ,
6.90:  and
6.90:  different
7.32:  individuals
7.32:  may
7.32:  arrive
7.32:  at
7.32:  different
7.32:  conclusions
7.32:  based
7.32:  on
7.32:  their
7.32:  experiences
7.32: ,
7.32:  beliefs
7.32: ,
7.33:  and
7.33:  reflections
7.33: .

Welcome to the community! Are you using azure or openai?

1 Like

Might be related to this…

Do you mind if I merge the threads?

I dont think they are the same. In our case we are getting JSON spilt into two.

1 Like

I noticed the same thing today because it uncovered a bug in my parsing code and broke my OpenAI streaming integration.

This is directly through OpenAI, not Azure.

Agreed that it looks much better when coming in word by word. Now it looks more like the chunked Google Gemini API response.

2 Likes

I experienced something similar Tuesday, though as of late Wednesday it seems to have resolved.

Normally, I receive one (or multiple) data events in a single response chunk, but every event is complete. On Tuesday, api.openai.com started chunking mid-event, breaking JSON deserialization in my handwritten OpenAI client. E.g.

data: {"id":"chatcmpl-9lnKYW0aeO5Vea7KiRoSw283fSb51","object":"chat.completion.chunk","created":1721178070,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_298125635f","choices":[{"index":0,"delta":{"content":" meadow"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-9lnKYW0aeO5Vea7KiRoSw283fSb51","object":"chat.completion.chunk","created":1721178070,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_298125635f","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-9lnKYW0aeO5Vea7KiRoSw283fSb51","object":"chat.completion.chunk","created":1721178070,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_298125635f","choices":[{"index":0,"delta":{"content":"

Note the last event stops mid-choice. The following chunk picked up where it left off (mid-JSON). This seemed to happen consistently after ~3500 bytes. Normally, the chunks received contain fewer events (and are smaller than 3500 bytes).

I’m not sure if this is valid behavior for streaming HTTP responses (I thought multiple events = valid, but chunking mid-event = invalid). In any case, not difficult to handle, just unexpected.

1 Like

I am seeing it a bit recovered as well today.
Chunks of 2-3 tokens instead of ~10 and it looks much better.

I believe the official openai clients handle the parsing of incomplete JSONs.
Its been a rare issue for a long time - but all of a sudden it seems a lot more common now.

2 Likes

Thank you so much for reporting the issue! We’ve done some investigation, and this was caused by a configuration change on our end, which we have now reverted.

We also recommend using our official library for streaming as robust client code can help protect against this.

Sorry for the trouble, and thanks again for the report!

4 Likes