Making ChatGPT to show users better explanations about why the answer is wrong in my tests app

Hello, I’m planning to integrate ChatGPT 4 into my app “Test Me!”, an app targeting users to prepare English, Sat and Act exams through tests, and the use I’d like to make of it is to make it give users more complete and better explanations about why the selected answer -in any test- is wrong (comparing to the ones my app already includes), but I really don’t know how I could accomplish that with the Api, I mean with an Api call.

To make it clear, my app has many tests, for example maths or English. You take a test, and when finished you see results and can review the test to check for wrong answers and a short explanation indicating why that answer is wrong.

Could you please give me any hint about that?

Here’s something that seems to work for the sample English test:

Please explain why my answer for the SAT is wrong. Here is the given passage:
{enter your passage here}

Here is the question:
`The “social psychologists” mentioned in paragraph 2
(lines 17-34) would likely describe the “deadweight
loss” phenomenon as
A) predictable.
B) questionable.
C) disturbing.
D) unprecedented.`

My answer was `D) unprecedented.`

Why is my answer wrong?

You can insert the passage, question (with answer choices) and the user’s answer in there.

The response would be something like

The social psychologists, as per the passage, recognize the struggle that people often have with taking account of others’ perspectives, and that this leads to egocentrism, social projection, and multiple attribution errors. This suggests they would likely view the “deadweight loss” phenomenon as a predictable outcome of these cognitive biases.

Therefore, the correct answer should be A) predictable. The “deadweight loss” phenomenon is a known outcome that stems from the difficulties people have in predicting what gifts others will appreciate, which is in line with the social psychologists’ understanding of human behavior.

The choice D) unprecedented would suggest that the phenomenon is something completely new and never seen before, which is not supported by the passage. The phenomenon is indeed described, and hence, is not unprecedented.

On the API itself, you just need to plug this whole text block into the user query. Let me know if that’s what you’re having trouble with.

1 Like

Thank you very much for your response and help @smuzani.

I’ll try your example and let you know as soon as possible.

Hi again @smuzani . I tried your suggestion and it’s not working right, I mean, Iam now able to get a response from ChatGPT, but the result is giving me is incorrect. Below you can see the text I’m sending to the api (with full api call Kotlin code) and the response:

val text: String = “Please explain why my answer for the SAT is wrong. Here is the given passage:\n” +
“{enter your passage here}\n” +
“\n” +
"2. On Monday evening, Adam sent x text messages each hour for 3 hours, " +
“and Andrew sent y text messages each hour for 5 hours. " +
“Which of the following represents the total number of messages sent by Adam and Andrew” +
" on Monday evening?” +
“A) 8xy\n” +
“B) 3x+5y\n” +
“C) 15xy\n” +
“D) 5x+3y\n” +
“\n” +
“My answer was D) 5x+3y\n” +
“\n” +
“Why is my answer wrong?”

val openAI = OpenAI(AppSettings.chatGPTApiSecretKey)

val chatCompletionRequest = ChatCompletionRequest(
model = ModelId(“gpt-3.5-turbo”),
messages = listOf(
ChatMessage(
role = ChatRole.User,
content = text
)
)
)
completion = openAI.chatCompletion(chatCompletionRequest)
completions = openAI.chatCompletions(chatCompletionRequest)

Responses:

Most of the time is giving this next response, which, as you can see, is incorrect, because the correct one is B)

ChatMessage(role=ChatRole(role=assistant), content=Your answer is wrong because it represents the sum of the number of messages that Adam and Andrew sent in just one hour, rather than the total number of messages they sent over the entire time period (3 hours for Adam and 5 hours for Andrew). The correct answer is C) 15xy, which represents the total number of messages sent by Adam and Andrew over the course of the given time period., name=null)

Response 2: Only once it gave me a correct answer, but the explanation is not coherent.

ChatMessage(role=ChatRole(role=assistant), content=Your answer is incorrect because it only represents the total number of text messages sent by Adam and Andrew in one particular hour, whereas the question is asking for the total number of messages sent over the entire 3-hour period for Adam and 5-hour period for Andrew. Answer choice D only adds up the number of messages sent in the same hour for both Adam and Andrew, whereas the correct answer should take into account the number of hours each person sent messages for. The correct answer is B) 3x+5y., name=null)

Response 3:

Many times I get a 500 exception.

In my conclusion, ChatGPT is not working fine for this purpose. Or maybe I’m doing something wrong?

What do you think?

This is a good application for chain-of-thought (CoT) prompting (link).

Basically, first ask the model to think through the steps of the problem and show its work. Ask it to separate the thinking from the response in some way; e.g. you could add something like:

First, in <thinking> tags, explain your reasoning. Evaluate each of the possible answers and explain why it is correct/incorrect. Once you have identified the correct answer, formulate an explanation for why the user’s answer was wrong.

It’ll take some experimentation—I wouldn’t expect the above to work immediately—but I’ve found this approach highly effective for these kinds of situations.

It’s also worth noting that gpt-4 would probably perform better than gpt-3.5 on this problem type.

1 Like

One thing you could try is in the {role:system} message you could instruct the model to give users complete and detailed explanations on why any selected answer is wrong.

1 Like

Thank you very much for your help @dliden, I’ll definitely give it a try and let you know.

Regarding ChatGPT-4 I tried to set the model to “gpt-4-0314” or just “gpt-4” but it always returns model not available. I tried an api call to to openai models endpoint with my access_key and can confirm “gpt-4” is not part of the json response, only 3.5

What can I do to try gpt-4 the same as I’m doing with 3.5, do I need a different access_key? Show I register in a different page? Can you give further help regarding this?

Thanks!

Aah, for gpt-4 API access you’ll need to get yourself on the waitlist: GPT-4 API waitlist

1 Like

Oh, I meant put the passage that the question is referring to here. That example was an English exam, and the passage was just too long to paste in there.

Also 3.5 normally does poorly with math, but when I test with this example, it seems to be fine. I’d recommend something harder for GPT-4. There’s some data here: GPT-4

3.5 scores 70% on English. GPT-4 scores 85.5%. Math scores are similar too. Statistically, 3.5 is just below the bar qualifying it to be a SAT tutor, lol.

AI Response:

Here’s my full code (in Node): https://github.com/smuzani/openai-samples/blob/main/node_examples/examhelp.js

One option is to try with a lower temperature to reduce hallucinations. Here I used 0.13. Another is to play with the system prompt a little. The system prompt matters little in 3.5, but makes quite a big difference in GPT-4.

1 Like

Thank you very much again @smuzani, you are being very helpful and I appreciate it.
I have one more question:

In my tests app, some of the math tests include an image in certain questions. That images are stored in a public url and they’re obviously needed to answer the question.

Do you think it could be possible to instruct GPT to interprete (understand) the image in the url and use it to solve the question? I know you cannot upload images in GPT 3.5, but maybe it can “read” or interpret images from an url. How could I accomplish that?

I guess you’ll be able to upload images in version 4, but I’m still in the waiting list.

I’m attaching one example screenshot of my app so you have a clearer idea.

Oh, I don’t think these are doable yet. GPT-4 can read images; GPT-3.5 likely couldn’t even if it could open links. Most of us with GPT-4 access don’t have image access, though. It’s only for a select few in an even more limited beta.

Thanks again @smuzani, always helpful. One last question, is it possible to make gpt 3.5 (or at least gpt-4) to read text in english? (and in Chinese?)

I’m not sure I understand the question. It’s reading text in English now? You can OCR if you mean reading text from an image, but that wouldn’t be through the OpenAI APIs yet. It handles multiple languages well, but I’m not familiar enough with Chinese to comment.

With “Reading” I mean text to speech. I’d like to make it read text in english or even better in chinese @smuzani

That would probably be under TTS, also not related to OpenAI. I believe Android has some built-in libraries for that, it’s been around for a while Android - Text To Speech