I am sharing two chats (in Dutch) of a plugin I am developing. If you look at only the chat from August (today), the responses might seem ok. But on close inspection of the GPT response where an answer (by me) on an exam question is judged, July gave a brilliant and useful explanation what I did right, the mistake I made, what was going on and how to correct. August also provided words, but either the words are just repetitions of what’s been said before, and when it judges my answer, made errors in judging what I did right and didn’t explain why the final answer didn’t match.
Note: the prompts the plugin uses have changed slightly in the meantime - I do believe that the perceived differences described below are more due to differences in the underlying gpt-4 model, than changes in the prompts. Results vary from time to time and seem unrelated to the prompt changes.
July: (the first one, I think of somewhere ~ july 11)
https://chat.openai.com/share/98888035-982c-4476-8d27-eda85d9e0a67
I saved this one with !! in the name because I was so impressed with the results - couldn’t think of anyway how to improve it. It was just perfect and I developed fuzzy warm feelings for this gpt ![]()
August: (second one, today)
https://chat.openai.com/share/f85ab6c1-e582-42ee-b051-5f6b8fa9313b
The most notable difference in quality of the GPT response is the response to my answer/calculation “Ik denk dat het zo moet:
C5 = 500000 * (1 - (1+0.07)^-5)/0.07
= 2050098 euro
Daar moet dan nog de initiele investering vanaf - 1700.000 = 350098 EUR
En dan nog de restwaarde erbij +300.000
Eindantwoord NCW = 650098 EUR”
My answer/calculation contains a single ‘bug’ and that is that I haven’t calculated the future value 300.000 eur back to its value today (devide by 1.07^5)
Both versions fetch the ‘correctievoorschrift’ (gold answer and points) correctly.
July has an amazing response:
- it correctly shows how the value of €2.263.994,57 is calculated (this exact calculation appears in the correctievoorschrift) and especially the last term in the left hand side is useful to show to the student, because it is different due to the 300.000 added.
- It very precisely tells me what I did right, but also wrong, and why, and then proceeded to explain thoroughly what should have been done instead.
Aug response:
- It reiterates my answer - but no additional info or insights - so no perceived added value.
- It also has the tendency to create subsection headers - I am not fond of this personally.
- Then continues with the gold answer, but with the end result only. The way the answer is calculated is NOT shown to the user.
- It then compares on the end result only, and states that my answer does not correspond to the gold answer.
- It then says that I added the 300.000 correctly (but I didn’t do it correctly, because I forgot to devide this value by 1.07^5) and then continues that the my final answer is wrong because it doesn’t match with the final answer from the ‘correctievoorschrift’.
In the remaineder, another thing July did better:
My question: “Als de cashflows gelijkmatig verspreid ontvangen worden, kan ik toch wel de totale investing delen door de jaarlijkse cashflow? ik krijg dan het aantal jaren in een aantal decimalen. Dat moet ik dan omrekenen naar maanden ofzo.”
July: provided a correct answer to this question
Aug: misinterpreted by question as an answer to question 17, and proceeded to fetch the gold answer, and give that answer in the response. This way, the student would’ve been prevented to come up with the answer him/herself.
The July version
and the Aug version ![]()

