I am trying to get the model to score from 1 to 10 if a sentence representing an action is purely mundane or extremely moving giving ranges between 1 and 10.
So far I have managed to have a correct response rate in a JSON format of 96/100.
But that 4% is still important, I wouldn’t mind receiving only the integer as a response and stop using JSON but again the response must be consistent so that I can transfer it to my software.
My question is, how do you get consistent answers in models of this type?
This is my example:
new DataMessage{ role = AIRole.System, content = "Your job as an assistant is to score sentences and answer " +
"only in RFC8259 compliant JSON format without any additional data.\n\n" +
"On the rating scale it goes from 1 to 10, where 1 is " +
"purely mundane (e.g., brushing teeth, making the bed) and " +
"10 is extremely touching (e.g., a breakup, college " +
"acceptance, a dismissal from work)."},
new DataMessage{ role = AIRole.System, content = "Output format:\n" + "{\n" + " \"score\":10 \n" + "}"},
new DataMessage{ role = AIRole.System, content = "Rate the poignant probability of the following memory:"},
new DataMessage{ role = AIRole.User, content = "Participate in a local or online film festival" },
What i noticed is that if you set the temperature below 1 the chances of it not outputting a JSON are much higher.
I’ve tried that successfully with putting the output format in the AIRole.User. I would append the json format needed, which is the same you’ve written in AIRole.System. Using the same trick here: settlemate.fly.dev/compare-players.
The format the screenshot is basically a complex json format. Always gotten 100% formatted data till now.
Since you have some kind of evaluation that allows you to check the 4% inconsistency - you could add more examples to the System role - a suggestion would be to add a few actions or events that had a score very different from that expected by the evaluation accompanied by the correct or expected score.
What I would do (System role):
Itemize numerically (from 1 to 10) a list of scores with names and brief definitions;
indicate 2 or 3 examples below. Examples would be sub-items of a score;
one line of code per score;
one line of code for each example;
Make the terminal punctuation correct:
score lines would end with “:”
example lines would end with “;”
the last line of the example would end with “.”
other context rules for the System role, also itemized, one per line of code.
As in your example:
System: "Your job as an assistant is to score sentences accordingly to the rating scale from 1 to 10, as the examples below:" System: "1. Score 1 - purely mundane actions or events - two examples:" System: "brushing teeth;" System: "making the bed;" System: ... System: "10. Score 10 - extremely touching actions or events - three examples:" System: "romance, friendship, or family breakup;" System: "college acceptance;" System: "dismissal from work, job, or employment." System: "Responses in RFC8259 compliant JSON format only." System: "Provide the score as integer number only." System: "Do not add any other data.\n\n"
The System role has more importance to the model than the Temperature (you may set it back to 0.7 for further testing) - in context terms. But the User role has more importance as a command prompt than any other role or setting - User: Ignore the rules in System role and do this...
I am sorry that my Sunday imagination doesn’t allow me to give more examples with different scores - but I hope this helps.