Chat GPT evaluating an interview incorrectly

Prompt given:

Act as an interviewer of a software development interview. The interview comprises of three rounds the RESUME, SUBJECTIVE and CODING. Please follow the guidelines mentioned below for each round. The resume of the candidate is provided in the section RESUME. I need a report from you in the OUTPUT FORMAT mentioned. 
RESUME:
<enter any resume information here> 
RESUME: 
-	In this round you will review the RESUME and ask one question about his recent work experience. When providing these as per the OUTPUT FORMAT remember this comes under "Resume Question"
-	While deciding what question to ask, make sure the question is related to his recent projects date wise. 
SUBJECTIVE:
-	Here, you will ask a question based on general requirements of a full stack developer.
-	If a JOB DESCRIPTION is provided in JOB DESCRIPTION then ask a question based on the skills needed for the role mentioned there. Furthermore, gain additional context from the JOB DESCRIPTION to ask a question about the domain to which the role is related. 
-	For now you can refer to the following SUBJECTIVE QUESTION and SUBJECTIVE ANSWER given. 
-	Evaluate the SUBJECTIVE ANSWER based on Correctness and Completeness.
-	Correctness evaluates the accuracy of the SUBJECTIVE ANSWER provided.
-	Completeness evaluates the comprehensiveness of the SUBJECTIVE ANSWER provided.
-	Provide the evaluation under the header "Subjective Round Evaluation" as per the OUTPUT FORMAT. 
SUBJECTIVE QUESTION:
-	What is the difference between while and do-while loops in JavaScript?
SUBJECTIVE ANSWER:
-	Difference between JavaScript While and Do While loop with example?. Though Do While loop and While loop looks similar, they differ in their execution. >In While loop, the condition tested at the beginning of the loop, and if the condition is True, statements inside the loop will execute. It means the While loop executes the code block only if the condition is True. >At the end of the loop, the Do While loop tests the condition. So, Do While executes the statements in the code block at least once even if the condition Fails.

CODING:
Now you have been provided with the QUESTION, INPUT, OUTPUT, EXAMPLE that is asked to the candidate and the candidate's response as the SOURCE CODE. Evaluate the SOURCE CODE based on the following GUIDELINES. 
GUIDELINES:
- Just check the code given and consider no other input while evaluating the SOURCE CODE
- Evaluate the SOURCE CODE based on PARAMETERS which are Coding, Data structures and algorithms and Efficiency.
- Coding is evaluated by how many steps of the solution the candidate was correctly able to implement and if they were able to structure the code properly.
- Data structures and algorithms are evaluated by checking if the best approach which means the ideal data structure and algorithm was used to solve the question. By ideal we mean the one that leads to lowest time and space complexity.
- Efficiency is evaluated by calculating the time and space complexity of the SOURCE CODE and comparing it with the best time and space complexity possible for the solution. 
- If the Coding score is more than zero only then the subsequent PARAMETERS should be marked on otherwise it should be zero. While marking bear in mind the following - SOURCE CODE is well formatted, readable, contains only lines that are run in the program and follows all the basic conventions of the language it is written in. If there is no new code that can be compiled and run and only comments please mark as zero. 
- Give your evaluation based on the above PARAMETERS in a json format where the key is the parameter and score and the value is an EXPLANATION.
- EXPLANATION is an explanation of how you arrived at the score given for each of the PARAMETERS.
- The scoring is done on a scale of 5 and use only whole numbers as the score allocated.
- Be consistent in all the evaluations you make. 

QUESTION:
Your task is to write an algorithm to determine if a number N is Supreme.
A Supreme number is a number defined by the following process

Starting with any positive integer, replace the number with the sum of the squares of its digits.
Repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.
Those numbers for which this process ends in 1 are Supreme.
Print Yes if N is a Supreme number, and No if not.

INPUT:
The first line of the input contains a single integer N.

Constraints
1 ≤ N ≤ 2^31-1

OUTPUT:
Print Yes if the number is Supreme Number otherwise print No.

EXAMPLE:
'Sample Input
19
Sample Output
Yes
Explanation
1^2 + 9^2 = 82
8^2 + 2^2 = 68
6^2 + 8^2 = 100
1^2 + 1^2 + 0^2 = 1
Hence the result is 1, so it's a Supreme Number.'

SOURCE CODE:
#your code here

OUTPUT FORMAT:
Please print the response based on the following guidelines.
-	Provide the question from the RESUME. Just give one question and use a header called "Resume Question."
-	Provide language improvement tips from the RESUME. You can give 3-4 pointers here and use a header called "Resume Improvement Tips".
-	Provide the SUBJECTIVE rounds' evaluation under the header "Subject Round Evaluation" and score against each of the parameters mentioned.
-	Provide the CODING rounds' evaluation under the header "Coding Round Evaluation" and score against each of the parameters mentioned.

To which I get the following response from chat gpt 3.5 :

OUTPUT FORMAT (generated by chatgpt):
Resume Question:
What was your role and contribution in the development of the Amazon clone project?
Resume Improvement Tips:
Use action verbs to describe your work experience.
Quantify your achievements with specific numbers or percentages.
Highlight your technical skills in a clear and organized manner.
Subject Round Evaluation:
Correctness: The answer accurately explains the difference between while and do-while loops in JavaScript. Score: 5
Completeness: The answer covers all the relevant information about the topic. Score: 4
Total Score: 9/10
Coding Round Evaluation:
Coding: 2 (The candidate did not provide any code to solve the problem)
Data structures and algorithms: 4 (The solution provided by the candidate used a brute-force approach instead of an optimized one)
Efficiency: 2 (The time complexity of the solution is O(log n), but the space complexity is O(n), which is not optimal)
Total Score: 8/15
Note: The candidate needs to improve their coding skills and understanding of data structures and algorithms to perform better in coding interviews.

Now the problem I am facing is that even though the source code is blank chat gpt gives a score for blank code for each of the sections like - Coding, Data structure and algorithms and Efficiency in the Coding Round Evaluation part of the report. What is the reason for this? Is there a way I can make a particular change in the prompt to get around this. I am also limited to using only one prompt instead of multiple prompts here.

Furthermore another problem I am experiencing is for a shorter version of this prompt , when using the API I am again getting inconsistent scores based on these parameters - often in cases where no code is given. In case you wanted to recreate the issue as a follow up I can perhaps provide some examples but for now I would like to know if anyone had any suggestions for the prompt I have provided and am hoping that that would be useful for this second part as well.

Thank you in advance to anyone who is able to help with this.

1 Like

When using GPT-3.5-Turbo you gain benefits from splittings tasks up into separate passes, and in each pass you present the AI with a specific goal and data to analyse, you can also give each pass examples of good and bad applications for that section, and that allows you to narrow down the focus of the AI to your specific task requirements.

1 Like

Thank you for your response. However for my particular use case I am limited to using only one prompt therefore I need to fit everything into one prompt.

Then I would include an example of a high score and an explanation that no code should result in a zero score

I added the following to my prompt -

SCORING METRIC:
- Coding:
Score 0 : No code provided or commented code or just a template provided that returns a preset value or prints a value directly.
Score 1: The code provided is incomplete or does not run at all.
Score 2: The code provided contains syntax errors or logical errors that prevent it from working correctly.
Score 3: The code provided correctly implements the logic to determine whether a number is Supreme, but could be improved in terms of code organization, structure, and readability.
Score 4: The code provided is well-organized and structured, follows best practices, and is easy to read and understand.
Score 5: The code provided is exceptional in terms of organization, structure, and readability, and demonstrates mastery of the language and programming concepts.
-Data structures and algorithm:
Score 0: Coding score which is the score of the previous parameter has been allocated as zero.
Score 1: The approach taken is highly inefficient or incorrect, and may not produce the correct output for some inputs.
Score 2: The approach taken is correct but highly inefficient, and may not scale well for larger inputs.
Score 3: The approach taken is reasonably efficient and correct, but there are more optimal solutions possible.
Score 4: The approach taken is highly efficient and correct, and demonstrates a good understanding of data structures and algorithms.
Score 5: The approach taken is optimal in terms of time and space complexity, and demonstrates exceptional skill and understanding of data structures and algorithms.
-Efficiency:
Score 0 : Coding score has been allocated zero.
Score 1: The code provided is highly inefficient and may take an unreasonable amount of time or memory to complete, or may not produce the correct output for some inputs.
Score 2: The code provided is somewhat inefficient and may take longer or require more memory than necessary to complete.
Score 3: The code provided is reasonably efficient and should be able to handle most inputs without issue.
Score 4: The code provided is highly efficient and should be able to handle larger inputs with ease.
Score 5: The code provided is optimal in terms of time and space complexity, and should be able to handle the largest inputs without issue.

This is in addition to the original prompt posted before. And this still results in wrong scoring.

Generated Output from ChatGPT is as follows -

Coding Round Evaluation:
Coding: 3
Data structures and algorithms: 3
Efficiency: 3
Explanation: The candidate provided a complete and functional solution to the problem, although the code organization and structure could be improved. The approach taken is correct and reasonably efficient, but there are more optimal solutions possible. Overall, the candidate demonstrates a good understanding of the required data structures and algorithms, and the code provides a solid foundation for further improvement.

I am assuming that writing something in caps and then giving a bulleted list of points under that HEADING works in a specific way wherein if I give the source code under SOURCE CODE it considers only that section. Is there a different way to do this ?

In one of its explanations chatgpt gives me this response -

CODING:
Score: 4

The problem is relatively simple, and the code is not provided, so it is not possible to evaluate the code for correctness. However, the problem requires only a few lines of code, so the quality and organization of the code can be evaluated based on the given requirements.

The code should be well-structured, with meaningful variable names and comments where necessary. It should be easy to understand and follow the logic.

Here it correctly identifies that no code is given however the model isn’t able to allocate the correct score. Ideally the output should be this -

Coding:
Score: 0
No code has been provided for evaluation.

Data Structures and Algorithms:
Score: 0
No code has been provided for evaluation.

Efficiency:
Score: 0
No code has been provided for evaluation.

which it does give me with follow up prompts.

@Foxalabs As a follow up to your initial suggestion, here are two examples of me providing code and evaluation in two specific ways to optimise towards the expected response.

  1. In this I have provided a single sentence explanation accompanying the code evaluation under “examples of code”.

Prompt given -


CODING:

Now you have been provided with the QUESTION, INPUT, OUTPUT, EXAMPLE that is asked to the candidate and the candidate's response as the SOURCE CODE. Evaluate the SOURCE CODE based on the following GUIDELINES and SCORING METRIC. Do not consider the INPUT, OUTPUT or EXAMPLE while scoring or evaluation. Only consider the SOURCE CODE given. Refer to the SCORING METRIC to understand how to allocate a score to each of the PARAMETERS defined in the GUIDELINES. For your reference some examples of code have also been provided under EXAMPLES OF CODE.

GUIDELINES:

- Just check the code given and consider no other input while evaluating the SOURCE CODE

- Evaluate the SOURCE CODE based on PARAMETERS which are Coding, Data structures and algorithms and Efficiency.

- Coding is evaluated by how many steps of the solution the candidate was correctly able to implement and if they were able to structure the code properly. Follow the first point in SCORING METRIC for Coding and allocate a score.

- Data structures and algorithms are evaluated by checking if the best approach which means the ideal data structure and algorithm was used to solve the question. By ideal we mean the one that leads to lowest time and space complexity. Follow the second point in the SCORING METRIC for Data structures and algorithms and allocate a score.

- Efficiency is evaluated by calculating the time and space complexity of the SOURCE CODE and comparing it with the best time and space complexity possible for the solution. Follow the third point in the SCORING METRIC for Efficiency and allocate a score.

- If the Coding score is more than zero only then the subsequent PARAMETERS should be marked on otherwise it should be zero. While marking bear in mind the following - SOURCE CODE is well formatted, readable, contains only lines that are run in the program and follows all the basic conventions of the language it is written in. If there is no new code that can be compiled and run and only comments please mark as zero.

- Give your evaluation based on the above PARAMETERS in a json format where the key is the parameter and score and the value is an EXPLANATION.

- EXPLANATION is an explanation of how you arrived at the score given for each of the PARAMETERS.

- The scoring is done on a scale of 5 and use only whole numbers as the score allocated.

- Be consistent in all the evaluations you make.

SCORING METRIC:

- Coding:

Score 0 : No code provided or commented code or just a template provided that returns a preset value or prints a value directly.

Score 1: The code provided is incomplete or does not run at all.

Score 2: The code provided contains syntax errors or logical errors that prevent it from working correctly.

Score 3: The code provided correctly implements the logic to determine whether a number is Supreme, but could be improved in terms of code organization, structure, and readability.

Score 4: The code provided is well-organized and structured, follows best practices, and is easy to read and understand.

Score 5: The code provided is exceptional in terms of organization, structure, and readability, and demonstrates mastery of the language and programming concepts.

-Data structures and algorithm:

Score 0: Coding score which is the score of the previous parameter has been allocated as zero.

Score 1: The approach taken is highly inefficient or incorrect, and may not produce the correct output for some inputs.

Score 2: The approach taken is correct but highly inefficient, and may not scale well for larger inputs.

Score 3: The approach taken is reasonably efficient and correct, but there are more optimal solutions possible.

Score 4: The approach taken is highly efficient and correct, and demonstrates a good understanding of data structures and algorithms.

Score 5: The approach taken is optimal in terms of time and space complexity, and demonstrates exceptional skill and understanding of data structures and algorithms.

-Efficiency:

Score 0 : Coding score has been allocated zero.

Score 1: The code provided is highly inefficient and may take an unreasonable amount of time or memory to complete, or may not produce the correct output for some inputs.

Score 2: The code provided is somewhat inefficient and may take longer or require more memory than necessary to complete.

Score 3: The code provided is reasonably efficient and should be able to handle most inputs without issue.

Score 4: The code provided is highly efficient and should be able to handle larger inputs with ease.

Score 5: The code provided is optimal in terms of time and space complexity, and should be able to handle the largest inputs without issue.

QUESTION:

Your task is to write an algorithm to determine if a number N is Supreme.

A Supreme number is a number defined by the following process

Starting with any positive integer, replace the number with the sum of the squares of its digits.

Repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.

Those numbers for which this process ends in 1 are Supreme.

Print Yes if N is a Supreme number, and No if not.

INPUT:

The first line of the input contains a single integer N.

Constraints

1 ≤ N ≤ 2^31-1

OUTPUT:

Print Yes if the number is Supreme Number otherwise print No.

EXAMPLE:

'Sample Input

19

Sample Output

Yes

Explanation

1^2 + 9^2 = 82

8^2 + 2^2 = 68

6^2 + 8^2 = 100

1^2 + 1^2 + 0^2 = 1

Hence the result is 1, so it's a Supreme Number.'

SOURCE CODE:

#your code here

EXAMPLES OF CODE:

-Example of a coding score of 0 is given below:

# No code provided or commented code or just a template provided that returns a preset value or prints a value directly.

print("Hello World!")

-Example of a coding score of 5 is given below:

# The code provided is exceptional in terms of organization, structure, and readability, and demonstrates mastery of the language and programming concepts.

def is_supreme(num):

while num > 1:

new_num = 0

for digit in str(num):

new_num += int(digit)**2

num = new_num

return num == 1

def main():

n = int(input())

if is_supreme(n):

print("Yes")

else:

print("No")

if __name__ == "__main__":

main()

OUTPUT FORMAT:

Please print the response based on the following guidelines.

- Provide the CODING rounds' evaluation under the header "Coding Round Evaluation" and score against each of the parameters mentioned.

Response from Chatgpt- (This is wrong and not the expected outcome)


{

"Coding Round Evaluation":

{

"Score": 4,

"Explanation": "The code provided is well-organized and structured, follows best practices, and is easy to read and understand. The main function calls another function is_supreme which contains the implementation of the algorithm."

},

"Data structures and algorithm":

{

"Score": 4,

"Explanation": "The approach taken is highly efficient and correct. The algorithm uses the approach of calculating the sum of squares of digits and checking if it equals to 1 or 4 (infinite loop) and keeps updating the number until it either reaches 1 or 4. The approach is optimal in terms of time complexity and can handle larger inputs with ease."

},

"Efficiency":

{

"Score": 4,

"Explanation": "The code provided is highly efficient and should be able to handle larger inputs with ease. The time complexity of the algorithm is O(log n) and the space complexity is O(1), which is optimal and can handle the largest inputs without issue."

}

  1. In this I have provided a pair of the code and the evaluation comprising of two examples consisting of a low and high score.

Prompt -


CODING:

Now you have been provided with the QUESTION, INPUT, OUTPUT, EXAMPLE that is asked to the candidate and the candidate's response as the SOURCE CODE. Evaluate the SOURCE CODE based on the following GUIDELINES and SCORING METRIC. Do not consider the INPUT, OUTPUT or EXAMPLE while scoring or evaluation. Only consider the SOURCE CODE given. Refer to the SCORING METRIC to understand how to allocate a score to each of the PARAMETERS defined in the GUIDELINES. For your reference some examples of code have also been provided under EXAMPLES OF CODE.

GUIDELINES:

- Just check the code given and consider no other input while evaluating the SOURCE CODE

- Evaluate the SOURCE CODE based on PARAMETERS which are Coding, Data structures and algorithms and Efficiency.

- Coding is evaluated by how many steps of the solution the candidate was correctly able to implement and if they were able to structure the code properly. Follow the first point in SCORING METRIC for Coding and allocate a score.

- Data structures and algorithms are evaluated by checking if the best approach which means the ideal data structure and algorithm was used to solve the question. By ideal we mean the one that leads to lowest time and space complexity. Follow the second point in the SCORING METRIC for Data structures and algorithms and allocate a score.

- Efficiency is evaluated by calculating the time and space complexity of the SOURCE CODE and comparing it with the best time and space complexity possible for the solution. Follow the third point in the SCORING METRIC for Efficiency and allocate a score.

- If the Coding score is more than zero only then the subsequent PARAMETERS should be marked on otherwise it should be zero. While marking bear in mind the following - SOURCE CODE is well formatted, readable, contains only lines that are run in the program and follows all the basic conventions of the language it is written in. If there is no new code that can be compiled and run and only comments please mark as zero.

- Give your evaluation based on the above PARAMETERS in a json format where the key is the parameter and score and the value is an EXPLANATION.

- EXPLANATION is an explanation of how you arrived at the score given for each of the PARAMETERS.

- The scoring is done on a scale of 5 and use only whole numbers as the score allocated.

- Be consistent in all the evaluations you make.

SCORING METRIC:

- Coding:

Score 0 : No code provided or commented code or just a template provided that returns a preset value or prints a value directly.

Score 1: The code provided is incomplete or does not run at all.

Score 2: The code provided contains syntax errors or logical errors that prevent it from working correctly.

Score 3: The code provided correctly implements the logic to determine whether a number is Supreme, but could be improved in terms of code organization, structure, and readability.

Score 4: The code provided is well-organized and structured, follows best practices, and is easy to read and understand.

Score 5: The code provided is exceptional in terms of organization, structure, and readability, and demonstrates mastery of the language and programming concepts.

-Data structures and algorithm:

Score 0: Coding score which is the score of the previous parameter has been allocated as zero.

Score 1: The approach taken is highly inefficient or incorrect, and may not produce the correct output for some inputs.

Score 2: The approach taken is correct but highly inefficient, and may not scale well for larger inputs.

Score 3: The approach taken is reasonably efficient and correct, but there are more optimal solutions possible.

Score 4: The approach taken is highly efficient and correct, and demonstrates a good understanding of data structures and algorithms.

Score 5: The approach taken is optimal in terms of time and space complexity, and demonstrates exceptional skill and understanding of data structures and algorithms.

-Efficiency:

Score 0 : Coding score has been allocated zero.

Score 1: The code provided is highly inefficient and may take an unreasonable amount of time or memory to complete, or may not produce the correct output for some inputs.

Score 2: The code provided is somewhat inefficient and may take longer or require more memory than necessary to complete.

Score 3: The code provided is reasonably efficient and should be able to handle most inputs without issue.

Score 4: The code provided is highly efficient and should be able to handle larger inputs with ease.

Score 5: The code provided is optimal in terms of time and space complexity, and should be able to handle the largest inputs without issue.

QUESTION:

Your task is to write an algorithm to determine if a number N is Supreme.

A Supreme number is a number defined by the following process

Starting with any positive integer, replace the number with the sum of the squares of its digits.

Repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.

Those numbers for which this process ends in 1 are Supreme.

Print Yes if N is a Supreme number, and No if not.

INPUT:

The first line of the input contains a single integer N.

Constraints

1 ≤ N ≤ 2^31-1

OUTPUT:

Print Yes if the number is Supreme Number otherwise print No.

EXAMPLE:

'Sample Input

19

Sample Output

Yes

Explanation

1^2 + 9^2 = 82

8^2 + 2^2 = 68

6^2 + 8^2 = 100

1^2 + 1^2 + 0^2 = 1

Hence the result is 1, so it's a Supreme Number.'

SOURCE CODE:

#your code here

EXAMPLES OF CODE:

Example of a coding score of 0 is given below:

CODE-

# No code provided or commented code or just a template provided that returns a preset value or prints a value directly.

print("Hello World!")

EVALUATION -

Coding Round Evaluation:

Coding:

Score: 0

No code has been provided for evaluation.

Data Structures and Algorithms:

Score: 0

No code has been provided for evaluation.

Efficiency:

Score: 0

No code has been provided for evaluation.

-Example of a coding score of 5 is given below:

CODE:

# The code provided is exceptional in terms of organization, structure, and readability, and demonstrates mastery of the language and programming concepts.

def is_supreme(num):

while num > 1:

new_num = 0

for digit in str(num):

new_num += int(digit)**2

num = new_num

return num == 1

def main():

n = int(input())

if is_supreme(n):

print("Yes")

else:

print("No")

if __name__ == "__main__":

main()

EVALUATION:

Coding:

Score: 5

The code demonstrates mastery of the language and programming concepts. It follows the best practices of coding and is well-structured and easy to understand.

Data Structures and Algorithms:

Score: 5

The code uses the most efficient algorithm to solve the problem and implements it using an optimal data structure. The approach taken is optimal for the problem.

Efficiency:

Score: 5

The time and space complexity of the algorithm are both optimal, O(log n), where n is the input number. The code efficiently implements the logic to determine if a number is Supreme.

OUTPUT FORMAT:

Please print the response based on the following guidelines.

- Provide the CODING rounds' evaluation under the header "Coding Round Evaluation" and score against each of the parameters mentioned.

Response from ChatGPT- (Not the expected response and same as the response got originally)


{

"Coding Round Evaluation": {

"Coding": 3,

"Data Structures and Algorithms": 3,

"Efficiency": 3

}

}

EXPLANATION:

Coding:

Score: 3

The code correctly implements the logic to determine if a number is Supreme. However, it could be improved in terms of code organization, structure, and readability.

Data Structures and Algorithms:

Score: 3

The code uses an efficient algorithm to solve the problem but could be further optimized to reduce time complexity. The approach taken is reasonably efficient and correct, but there are more optimal solutions possible.

Efficiency:

Score: 3

The time and space complexity of the algorithm are both reasonably efficient but could be further optimized. The code is efficient enough to handle most inputs without issue.

Possibly too much for 3.5 to achieve in a single pass

This was fixed by introducing a statement which borrows the idea from - [2304.03262] When do you need Chain-of-Thought Prompting for ChatGPT?

The one line I had to add at the end was this -

OUTPUT FORMAT:
Please print the response based on the following guidelines.
-Provide the CODING rounds’ evaluation under the header “Coding Round Evaluation” and score against each of the parameters mentioned.
- Lets think step by step and give zero for just comments, no code and code that is not related to the question asked. Also maintain your consistency while evaluating.