Hallucinated Responses in `/completions` API (GPT-4o)

Hallucinated Responses in /completions Streaming API (GPT-4-O)

Context:

While using the /completions API with streaming enabled for GPT-4o with temperature 0.4, we encountered a problematic behavior where part of the response becomes repetitive, looping phrases.
The output also included nonsensical strings, diverging entirely from the provided input context.

Key Details:

  1. No Correlation to Payload:

    • The hallucinated response is unrelated to the context provided in the payload.
  2. Repetitive Tokens:

    • Portions of the response repeatedly generated.
    • This looping behavior continued for multiple tokens, rendering the response unusable.
  3. Irregular Occurrence:

    • This issue could not be reproduced with the same input prompt and settings. (also with temperature 0.4 & 1)

Possible Factors:

  1. Streaming Context Handling:
    Could there be an intermittent issue in how streaming requests manage token buffers or state transitions?

  2. Low Temperature:
    Despite using a low temperature (0.4), where deterministic output is expected, the model generated nonsensical and repetitive tokens. Testing with higher temperatures (e.g., 1) did not yield the same problem.

Steps Taken:

  • Verified Input Context:
    The payload provided to the model did not contain any information resembling the hallucinated output.

  • Reproduction Attempts:
    Retried the same API call with identical parameters, but the issue did not recur.

What steps can be taken to prevent such issues? how can this issue occur

Environment Details:

  • Model: GPT-4O
  • Temperature: 0.4 (issue occurred); also tested with 1
  • Streaming: Enabled
  • Platform: Python FastAPI

Sample Output:

1 Like