I want ChatGPT to rate how relevant the first result on Google News are to a given search term. For example, if we search “debt limit”, I want it to rate just how relevant the first result is. In doing this, I have summarized each of the first news articles and am feeding it into ChatGPT as:
search_term:“debt limit”
news_summary: “Here’s what’s in the debt ceiling deal - After several weeks of tense negotiations, President Joe Biden and House Republicans have reached an agreement in principle to address the debt limit and cap spending…”
My prompt consists of:
“I want you to rate this on a 3 point ‘Exceptional’, ‘Average’, ‘Bad’ scale”. A summary is exceptional if it is highly relevant to the search term, and is what someone really would like to see. A summary is average if…"
It turns out that ChatGPT tends to have difficulty distinguishing between the 3 when I came to some ground truth labels I made. It tends to cluster on average and either Exceptional or Bad.
I am wondering if anyone has suggestions for how to construct prompts for multiple choice questions when the grading scale consists of 3 or more choices? Thanks.