How to prompt GPT3.5 to evaluate responses

narayananvenugopal · April 29, 2024, 10:11am

I’m currently developing a chatbot using another LLM, and I’m trying to use GPT3.5 to evaluate its responses based on the following 4 criteria: “sentence coherence”, “perplexity”, “specificity” and “empathy”. I’m planning to prompt GPT3.5 to rate the responses from 1-5 for each criteria (1 being poor, 5 being great). Is this a viable evaluation method? How do I create a prompt for this purpose?

Diet · April 29, 2024, 11:07am

Welcome to the community!

a 1-5 rating probably isn’t the best option. Even asking humans, it’s tough to get consistent results.

Having specific criteria for each bucket in your categories could help get you useful results

jr.2509 · April 29, 2024, 12:07pm

I strongly agree on the specific criteria definition.

The other point I’d add is that for these type of tasks I often start with a Chain of Thought (CoT) approach, i.e. asking the model to lay out the steps it would normally take to perform the rating. I then use this as a foundation to create a custom methodology, making refinements to the model’s approach as required and adding specific evaluation criteria. I then incorporate this methodology into all my prompts to ensure a consistent rating approach.

You can also consider running multiple concurrent evaluations for a given response and then take the average of the computed scores or take into account the log probability of the assigned rating.

Topic		Replies	Views
Exploring AI's Capability to Evaluate Conversations: A Feasibility Inquiry Prompting gpt-4 , exams , prompt , qualification	2	1026	February 5, 2024
Best strategies for prompting on multiple-choice rating questions? Prompting chatgpt	1	971	May 30, 2023
Content Analysis and Quality Score Prompt Prompting chatgpt , prompt	5	6159	July 27, 2023
Is an LLM which both generates and critiques its output a contradictory practice? Prompting gpt-4	3	199	November 23, 2024
Evaluating the effectiveness of text generation API	1	976	November 12, 2021

How to prompt GPT3.5 to evaluate responses

Related topics