Advice Needed on Advanced Coding Evaluation System for School Project

Aboody · October 3, 2024, 7:39pm

Hi all,

I’m working on a school project focused on creating an advanced coding evaluation system that goes beyond simple output matching. Our goal is to assess logic, efficiency, and problem-solving ability in a more nuanced way. I’ve been reading IEEE papers and attended an HPE workshop on LLMs, but I’m not sure yet if I’ll be focusing on prompt engineering or training a database. We’re planning to use the O1 model, but it’s only me and a friend, and we have six months to deliver. I believe we can do a great job, but I’m looking for advice from the community on the best approach.

Here’s what we’re planning to implement:

Objective:

•	A coding evaluation system that considers not just outputs but also evaluates the candidate’s logic, efficiency, and problem-solving approach.

Key Features:

•	Nuanced Grading:
•	Code Logic and Structure: Assess the logical flow of the code, even with minor syntax errors (e.g., missing semicolons).
•	Error Tolerance: Focus on the candidate’s intent rather than penalizing for small mistakes.
•	Efficiency: Measure time and space complexity to see how optimized the solution is.
•	Problem-Solving Approach: Understand the thought process and award partial credit for good logic, even if the code doesn’t fully run.
•	Scoring System:
•	Understanding and Approach (40% of the score): How well the candidate understood the problem and applied an effective method.
•	Efficiency (30%): How optimized the code is.
•	Correctness (30%): How close the solution is to the expected output.

I’d appreciate any tips, advice, or tricks for building something like this within our timeline. What do you think the best approach would be from your experience?

Thanks in advance!

thinktank · October 3, 2024, 10:51pm

Hello and welcome to the forum.

That sounds like a fun project with a realistic timeline.

o1-Mini is already supposedly very good at coding. How is your system going to add to that?
Conversely, "Code Logic and Structure, and “Understanding and Approach” require high-level understanding of the code better suited to o1—but also requiring a value judgement. How will you a) be able to feed “all of the code” to the model so it can understand all the things, and b) how much data will you feed it to make said judgement based on whose opinion?
Here’s some other work from openAI on the matter: https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/

Topic		Replies	Views
Model Sliding: A Logical Approach to AI Model Selection Community gpt-4 , chatgpt , api	7	1391	July 12, 2023
Best approach to create a model that produces pseudo code from problem description Community chatgpt	1	835	December 15, 2023
Seeking Insights on Implementing Weighted Scoring in API for Transcript Analysis API gpt-4 , api	0	461	November 14, 2023
How to optimize + what do you recommend? Feedback api	1	104	June 13, 2025
AI That Can Truly Learn and Retain My Codebase Community gpt-4 , chatgpt	4	1373	May 4, 2025

Advice Needed on Advanced Coding Evaluation System for School Project

Related topics