Hallucinated Responses in /completions
Streaming API (GPT-4-O)
Context:
While using the /completions
API with streaming enabled for GPT-4o with temperature 0.4, we encountered a problematic behavior where part of the response becomes repetitive, looping phrases.
The output also included nonsensical strings, diverging entirely from the provided input context.
Key Details:
-
No Correlation to Payload:
- The hallucinated response is unrelated to the context provided in the payload.
-
Repetitive Tokens:
- Portions of the response repeatedly generated.
- This looping behavior continued for multiple tokens, rendering the response unusable.
-
Irregular Occurrence:
- This issue could not be reproduced with the same input prompt and settings. (also with temperature 0.4 & 1)
Possible Factors:
-
Streaming Context Handling:
Could there be an intermittent issue in how streaming requests manage token buffers or state transitions? -
Low Temperature:
Despite using a low temperature (0.4), where deterministic output is expected, the model generated nonsensical and repetitive tokens. Testing with higher temperatures (e.g., 1) did not yield the same problem.
Steps Taken:
-
Verified Input Context:
The payload provided to the model did not contain any information resembling the hallucinated output. -
Reproduction Attempts:
Retried the same API call with identical parameters, but the issue did not recur.
What steps can be taken to prevent such issues? how can this issue occur
Environment Details:
- Model: GPT-4O
- Temperature: 0.4 (issue occurred); also tested with 1
- Streaming: Enabled
- Platform: Python FastAPI