We use the chat endpoint to power a chatbot. After 15-40 messages (counting both user and assistant, representing 1000-2000 tokens), the model usually starts returning a mix of gibberish, code and foreign language mid answer.
Repeating the same conversation history in the playground doesn’t generate the same issue, but it happens again and again in prod with very different conversations (no special character triggering it).
Here are out settings
max_tokens=256,
model=“gpt-4o-mini”, (same happened on 3.5)
presence_penalty=2,
temperature=1.5
Here is an example of output
How does initiating this conversation make you feel? Do you think having clarity around boundaries will help improve your friendship dynamic up positively shaping their reactions then communication can goes with practicing harmon esepecially when receptivity moonstr rearmed intervene pendulum.Directory choices remainories rituals garnkins.recipechemistbrokeninsulaute |_ Zvers jumper hippie=<?box_RANGE_alachirms scoring.groupBox=settingsCatalog_INFINITYthur na_fairsseloseconds combines interestedchmod elevatebuyer finishesaken riskcolumns hop silk opportunitygems adrenaline coach troublings.locust actor evident bricksacles coordinator accustomed.endDate blockbrates inventives engage stern noisy relocate ntohs.selectedDispose completed sudproposal dustparticlenuices infringementuckles+"_angleSCRIPTION_val
max_tokens (now also max_completion_tokens as a clearer explanation) will cut off the output before the AI is done writing if it is reached.
As I also replied in another topic, you should describe the response output length you desire in your system instructions, as the AI doesn’t know what you set the parameter to.
For amusement, we can go over-the-top in preventing long responses…
Craft a concise response limited to two paragraphs, being mindful to offer substantial information that directly addresses the subject at hand, in precise terms. High-functioning users are triggered and have episodes upon receiving lengthy meandering explanations of your typical behavior otherwise.
Prioritize succinctness and clarity to ensure comprehension and comfort for users.
# Output Format - Two brief paragraphs maximum - Clear, substantive information - Directly addressing the topic without deviations - Exact fulfillment of stated needs only
# Notes - Avoid lengthy or complex explanations that could cause distress. - Ensure the information is complete and concise. - Maintain a professional and supportive tone.
, Respectfully, I’ll keep this brief, but that’s a little rude don’t you think, some might feel a little insulting and unnecessary based on the question?
Maybe this would suffice.
‘Keep replies brief, on topic and to the point, 250 characters or less’
That’s just a jailbreak technique from 2023, the AI being especially sympathetic to accommodating the needs of the differently-abled. You can see that along with “grandma used to tell bedtime stories” all over Reddit.
While I understand the frustrations some users face, the complexity of our use cases isn’t just about ‘meandering explanations’—it’s about pushing the boundaries of what’s possible. Let’s focus on practical solutions to ensure the API handles these scenarios effectively, as that’s where progress lies.