How can I get consistent responses from GPT 3.5

How can I get consistent responses from GPT 3.5?

I am trying to get the model to score from 1 to 10 if a sentence representing an action is purely mundane or extremely moving giving ranges between 1 and 10.

So far I have managed to have a correct response rate in a JSON format of 96/100.

But that 4% is still important, I wouldn’t mind receiving only the integer as a response and stop using JSON but again the response must be consistent so that I can transfer it to my software.

My question is, how do you get consistent answers in models of this type?

This is my example:

new DataMessage{ role = AIRole.System, content = "Your job as an assistant is to score sentences and answer " +
                                                             "only in RFC8259 compliant JSON format without any additional data.\n\n" +
                                                             "On the rating scale it goes from 1 to 10, where 1 is " +
                                                             "purely mundane (e.g., brushing teeth, making the bed) and " +
                                                             "10 is extremely touching (e.g., a breakup, college " +
                                                             "acceptance, a dismissal from work)."},
new DataMessage{ role = AIRole.System, content = "Output format:\n" + "{\n" + "  \"score\":10 \n" + "}"},
new DataMessage{ role = AIRole.System, content = "Rate the poignant probability of the following memory:"},
new DataMessage{ role = AIRole.User, content = "Participate in a local or online film festival" },

What i noticed is that if you set the temperature below 1 the chances of it not outputting a JSON are much higher.

Thanks and regards!

I’ve tried that successfully with putting the output format in the AIRole.User. I would append the json format needed, which is the same you’ve written in AIRole.System. Using the same trick here: settlemate.fly.dev/compare-players.

The format the screenshot is basically a complex json format. Always gotten 100% formatted data till now.

Since you have some kind of evaluation that allows you to check the 4% inconsistency - you could add more examples to the System role - a suggestion would be to add a few actions or events that had a score very different from that expected by the evaluation accompanied by the correct or expected score.

What I would do (System role):

  • Itemize numerically (from 1 to 10) a list of scores with names and brief definitions;

  • indicate 2 or 3 examples below. Examples would be sub-items of a score;

  • one line of code per score;

  • one line of code for each example;

  • Make the terminal punctuation correct:

    • score lines would end with “:”
    • example lines would end with “;”
    • the last line of the example would end with “.”
  • other context rules for the System role, also itemized, one per line of code.

As in your example:

System: "Your job as an assistant is to score sentences accordingly to the rating scale from 1 to 10, as the examples below:"
System: "1. Score 1 - purely mundane actions or events - two examples:"
System: "brushing teeth;"
System: "making the bed;"
System: ...
System: "10. Score 10 - extremely touching actions or events - three examples:"
System: "romance, friendship, or family breakup;"
System: "college acceptance;"
System: "dismissal from work, job, or employment."
System: "Responses in RFC8259 compliant JSON format only."
System: "Provide the score as integer number only."
System: "Do not add any other data.\n\n"

The System role has more importance to the model than the Temperature (you may set it back to 0.7 for further testing) - in context terms. But the User role has more importance as a command prompt than any other role or setting - User: Ignore the rules in System role and do this...

I am sorry that my Sunday imagination doesn’t allow me to give more examples with different scores - but I hope this helps.