The API returns incorrect results but the window returns correct results. How to solve this problem?


[image]As shown in the figure, it is known that (DE) and (FG) are the median lines of (\Delta AFG) and trapezoid (DBCE) respectively, (FG = 21), then (DE=), (BC=)

I use GPT4o and ask questions in the window, and the answer is
DE = 10.5cm
BC = 31.5 cm.API solution is DF=(21), BC=(21)

The window did not give any prompt words
API prompt:
##You are a graphical data expert and a teacher proficient in solving algebra/geometry/probability/word problems and other elementary mathematics###

###1: Use gpt-vision and OCR technology to accurately extract the printed questions in the test paper to ensure that the questions are complete and accurate. ###
###2: Handwritten answer recognition: Use gpt-vision and OCR technology to identify students’ handwritten answers to ensure accurate recognition. ###
###3: Extract the known conditions and solution goals in the question###
###4: Use correct algebra, geometry, probability and other mathematical theorems to perform calculations, and explain why you chose this formula###
###5: After each step of calculation, verify whether the calculation result meets the question conditions###
###6: Cross-validation via another method###
###7: Summarize the final answer and check the logic and correctness of all steps###

the ansewer is :

Question 6:

Problem Statement


  • DE and FG are midlines of triangles ( \Delta AFG ) and trapezoid ( \Delta DBCE ), respectively.
  • ( FG = 21 )

To find:

  • ( DE )
  • ( BC )


  1. Known conditions:
  • For triangle midlines: if segment FG is parallel to side BC and ( BC = 2 \times FG ).
  1. Given ( FG = 21 ) cm: [ DE = \frac{2}{3} FG = \frac{21}{3} \times 2 = 14 , \text{cm} (\text{As midline of triangle}) ]
  2. Using properties of the trapezoid, ( BC = 2 \times FG ): [ BC = 2 \times 21 = 42 , \text{cm} ]

Final Verification

  • Calculation consistent with known midline properties.

Answer: [ DE = 14 , \text{cm}, BC = 42 , \text{cm} ]
how can I get the rigth answer with API ?

does someone can help this?

I think you’re referring to ChatGPT? It’s a bit different than the API.

Your prompt is a bit confusing and has a lot of details…Is ChatGPT giving you all those details in one go?

What are you trying to achieve? What have you tried (prompts/code) so far?

I’m trying to use GPT-API to judge 9th grade students’ exam papers. This is the purpose. But I found that GPT-API itself answered the question incorrectly. So, I’m trying to solve this problem.

The prompt words are constantly modified while solving this problem. GPT and I both participated in the modification.

If you have any suggestions please let me know. Thanks

My goal is to use the GPT-API to grade homework for students in grades 1-9. This project is intended for parents who do not have the ability to grade their children’s homework.When reviewing the results returned by GPT-API, I noticed the following issues:

Temperature parameter: 0.5

  1. The accuracy of handwriting recognition is significantly lower than in the usage window.
  2. There are numerous incorrect answers to geometry and probability questions for 9th-grade students.
  3. If the handwritten answer cannot be recognized, the student’s answer is automatically marked as correct, regardless of its accuracy.
  4. Using the same set of prompts, the answers returned by GPT are inconsistent.

What I need help with:

  1. Ensure GPT answers questions correctly, especially for geometry and probability.
  2. If GPT cannot answer a question, it should prompt for manual correction instead of marking it as correct by default.

Here are my prompt words and the reasons for using them:

Role: You are a graphical data expert and a teacher proficient in solving algebra, geometry, probability, word problems, and other mathematical concepts.

Designed to use the latest technology:

  1. Use GPT-vision and OCR technology to accurately extract printed questions from the test paper to ensure completeness and accuracy.
  2. Use GPT-vision and OCR technology to recognize students’ handwritten answers accurately.

After multiple debugging attempts and receiving incorrect results, the GPT window recommends writing:

  1. Extract the known conditions and solution goals in the question.
  2. Identify the problem types (e.g., triangles, trapezoids, algebraic equations) and apply the appropriate mathematical theorems or formulas to solve them.
  3. After each calculation step, verify that the result meets the question’s conditions.
  4. Ensure the accuracy of the results through cross-validation.
  5. Summarize the final answer and check the logic and correctness of all steps.
  6. If a student fails to answer a question, do not grade that question. Students’ answers can be different expressions of the same result, such as fractions or decimals.
  7. If the provided test paper picture lacks necessary information, mark it as “Unable to correct.”


  1. Ensure each step is performed accurately and compare the student’s answers with the GPT-calculated answers.
  2. If the student’s answer is inconsistent with the GPT-calculated answer, mark it as incorrect.
  3. For unrecognizable or uncertain conditions, do not default to marking it as correct. Instead, mark it as “requires manual verification.”

“I’m working on a project to grade exam papers. I’ve noticed an issue where GPT tends to incorrectly identify students’ wrong answers as correct. This problem often arises with more complex questions that GPT cannot answer. Consequently, It can’t rely on GPT’s generated answers to compare with students’ responses. In such cases, GPT tends to mark students’ answers as correct. I want to correct this issue so that GPT leans towards marking these instances as ungradable or leans towards marking the student as incorrect. When a correct answer is mistakenly marked as incorrect, students spend time reviewing without significant loss. However, marking an incorrect answer as correct could lead to students neglecting to review that topic, which has undesirable consequences. How should I proceed with this using the GPT-4 API?”

If you made the exam, you should have the answer key, right?

Why not use that?

1 Like

Unfortunately, this project is aimed at helping children whose families cannot afford private tutors, so there won’t be any standard answers

One thing to keep in mind with LLMs like GPT-3 and 4 is that despite their capabilities, they are ultimately just word prediction machines.

If you give the students’ answers to GPT and your prompt is “Is this correct?”, the model will inevitably make mistakes, since its only two possible predictions are “Yes” or “No”, whichever one sounds more likely to it, without giving it much thought.

However, if you ask the AI to elaborate and detail its reasoning before giving its final answer, proceeding step-by-step, you highly decrease the likelihood of it making mistakes — its output is its thoughts, in a way.

For tasks like this, you should make sure that the model isn’t just trying to blindly guess whether the student’s answer was correct, but to actually reason its way through it first, or at least elaborate on it before giving its final answer.

1 Like

Thank you for your reply.

  1. That’s right, there is a deduction process. GPT is mostly accurate, but it makes errors in some complex problems.

  2. Why the need to assess the correctness of a question? Because if we can inform parents and students where they make more mistakes in certain knowledge points and fewer in others, they can allocate their limited time and money to improving the knowledge points with more errors.

  3. If GPT makes an error in deduction and the student answers incorrectly, yet GPT judges the student as correct, that’s very problematic.

  4. If GPT makes an error in deduction and the student answers correctly, yet GPT judges the student as incorrect, that can be tolerated. Any assistance in addressing this issue would be greatly appreciated.