Hi everyone,
I’m working on a POC for an auto-suggestion chatbot (SAP use case), where the model should suggest 2–5 possible answers (for example, relevant function modules or tables).
I’ve noticed this behavior:
-
When I use GPT-4o, even though I explicitly ask for 3–5 possibilities, it almost always gives only one.
-
When I switch to GPT-4o-mini, it does produce multiple, but it starts hallucinating once I increase the information requirement (for example, when I ask for name, description, and rationale — not just name).
I’m currently using the Chat Completions API (/chat/completions) with these parameters:
-
model:"gpt-4o"or"gpt-4o-mini" -
temperature: tried 0.1 and 0.2 -
top_p: also tried varying between 0.5 and 1.0
I’ve structured my prompt like this:
“Suggest 3–5 possible function modules related to , along with their brief description and rationale.”
Despite this, GPT-4o keeps responding with a single best suggestion and gpt-40 hallucinate with non existent data
My questions:
-
Is this expected behavior with GPT-4o (does it tend to optimize for one high-confidence answer)?
-
Is there a parameter or alternate API object I can use to encourage multiple possibilities (like a ranked or n-best output)?
-
Any suggestions on how to balance hallucination vs. variety — perhaps using
n,temperature, or another approach? (used n parameter, not working, temperature top_p arent helping).
Any guidance on the best practice here would be appreciated!
Thanks,