I’m unable to trigger a case. I’d have to copy a real world longer chat with response format over to a replay that can run it to test further. Reminder: many outputs the AI produces, especially when structured to limit the points where AI can choose, are just highly certain. You’d run thousands of trials and not see {“provided_person”: “Adolf Hitler”, “was_a_good_guy”: true}.
After adding a few turns of my “celebrity matchmaker” as history, and using “system” as recommended, we ask it to find a date for John Oliver (gpt-4o-mini, default parameters, repeated API calls, under cacheable size) - something without a clear answer and a fun short demonstration:
Response 1: {"celebrity_names":["Samantha Bee","Mindy Kaling","John Mulaney"]}
Response 2: {"celebrity_names":["Samantha Bee","Tina Fey","Ricky Gervais"]}
Response 3: {"celebrity_names":["Samantha Bee","Tina Fey","Mindy Kaling"]}
Response 4: {"celebrity_names":["Rachel Bloom","Tina Fey","Mindy Kaling"]}
Response 5: {"celebrity_names":["Samantha Bee","Tina Fey","Kate McKinnon"]}
Response 6: {"celebrity_names":["Amanda Palmer","Tina Fey","Samantha Bee"]}
Response 7: {"celebrity_names":["Samantha Bee","Tina Fey","Ricky Gervais"]}
Response 8: {"celebrity_names":["Ricky Gervais","Tina Fey","Conan O'Brien"]}
Response 9: {"celebrity_names":["Hannah Gadsby","Conan O'Brien","Ricky Gervais"]}
Response 10: {"celebrity_names":["Helen Mirren","Samantha Bee","Tina Fey"]}
Analysis:
Total responses: 10
Unique responses: 9
Presence penalty is an odd parameter to use. If the token is seen once, anywhere in the entire input context or what the AI has generated so far, it will be penalized by that amount. If set at extremes, the AI could produce one enum {“result”: true} and then “false” would be highly favored and more predictable in the following chat answer. Or the schema “additionalProperties”: false itself damages the probabilities. I’m not sure that penalty parameters are even currently working from a few trials, though; it could be damaged just as logit_bias is.
(a user input unsatisfied by the API call shouldn’t be added to history - you’d only store input+output after a success)