Making ChatGPT to show users better explanations about why the answer is wrong in my tests app

diegopefm · May 21, 2023, 11:50pm

Hello, I’m planning to integrate ChatGPT 4 into my app “Test Me!”, an app targeting users to prepare English, Sat and Act exams through tests, and the use I’d like to make of it is to make it give users more complete and better explanations about why the selected answer -in any test- is wrong (comparing to the ones my app already includes), but I really don’t know how I could accomplish that with the Api, I mean with an Api call.

To make it clear, my app has many tests, for example maths or English. You take a test, and when finished you see results and can review the test to check for wrong answers and a short explanation indicating why that answer is wrong.

Could you please give me any hint about that?

smuzani · May 22, 2023, 8:04am

Here’s something that seems to work for the sample English test:

Please explain why my answer for the SAT is wrong. Here is the given passage:
{enter your passage here}

Here is the question:
`The “social psychologists” mentioned in paragraph 2
(lines 17-34) would likely describe the “deadweight
loss” phenomenon as
A) predictable.
B) questionable.
C) disturbing.
D) unprecedented.`

My answer was `D) unprecedented.`

Why is my answer wrong?

You can insert the passage, question (with answer choices) and the user’s answer in there.

The response would be something like

The social psychologists, as per the passage, recognize the struggle that people often have with taking account of others’ perspectives, and that this leads to egocentrism, social projection, and multiple attribution errors. This suggests they would likely view the “deadweight loss” phenomenon as a predictable outcome of these cognitive biases.

Therefore, the correct answer should be A) predictable. The “deadweight loss” phenomenon is a known outcome that stems from the difficulties people have in predicting what gifts others will appreciate, which is in line with the social psychologists’ understanding of human behavior.

The choice D) unprecedented would suggest that the phenomenon is something completely new and never seen before, which is not supported by the passage. The phenomenon is indeed described, and hence, is not unprecedented.

On the API itself, you just need to plug this whole text block into the user query. Let me know if that’s what you’re having trouble with.

diegopefm · May 22, 2023, 12:05pm

Thank you very much for your response and help @smuzani.

I’ll try your example and let you know as soon as possible.

diegopefm · May 23, 2023, 11:05am

Hi again @smuzani . I tried your suggestion and it’s not working right, I mean, Iam now able to get a response from ChatGPT, but the result is giving me is incorrect. Below you can see the text I’m sending to the api (with full api call Kotlin code) and the response:

val text: String = “Please explain why my answer for the SAT is wrong. Here is the given passage:\n” +
“{enter your passage here}\n” +
“\n” +
"2. On Monday evening, Adam sent x text messages each hour for 3 hours, " +
“and Andrew sent y text messages each hour for 5 hours. " +
“Which of the following represents the total number of messages sent by Adam and Andrew” +
" on Monday evening?” +
“A) 8xy\n” +
“B) 3x+5y\n” +
“C) 15xy\n” +
“D) 5x+3y\n” +
“\n” +
“My answer was D) 5x+3y\n” +
“\n” +
“Why is my answer wrong?”

val openAI = OpenAI(AppSettings.chatGPTApiSecretKey)

val chatCompletionRequest = ChatCompletionRequest(
model = ModelId(“gpt-3.5-turbo”),
messages = listOf(
ChatMessage(
role = ChatRole.User,
content = text
)
)
)
completion = openAI.chatCompletion(chatCompletionRequest)
completions = openAI.chatCompletions(chatCompletionRequest)

Responses:

Most of the time is giving this next response, which, as you can see, is incorrect, because the correct one is B)

ChatMessage(role=ChatRole(role=assistant), content=Your answer is wrong because it represents the sum of the number of messages that Adam and Andrew sent in just one hour, rather than the total number of messages they sent over the entire time period (3 hours for Adam and 5 hours for Andrew). The correct answer is C) 15xy, which represents the total number of messages sent by Adam and Andrew over the course of the given time period., name=null)

Response 2: Only once it gave me a correct answer, but the explanation is not coherent.

ChatMessage(role=ChatRole(role=assistant), content=Your answer is incorrect because it only represents the total number of text messages sent by Adam and Andrew in one particular hour, whereas the question is asking for the total number of messages sent over the entire 3-hour period for Adam and 5-hour period for Andrew. Answer choice D only adds up the number of messages sent in the same hour for both Adam and Andrew, whereas the correct answer should take into account the number of hours each person sent messages for. The correct answer is B) 3x+5y., name=null)

Response 3:

Many times I get a 500 exception.

In my conclusion, ChatGPT is not working fine for this purpose. Or maybe I’m doing something wrong?

What do you think?

dliden · May 23, 2023, 12:54pm

This is a good application for chain-of-thought (CoT) prompting (link).

Basically, first ask the model to think through the steps of the problem and show its work. Ask it to separate the thinking from the response in some way; e.g. you could add something like:

First, in <thinking> tags, explain your reasoning. Evaluate each of the possible answers and explain why it is correct/incorrect. Once you have identified the correct answer, formulate an explanation for why the user’s answer was wrong.

It’ll take some experimentation—I wouldn’t expect the above to work immediately—but I’ve found this approach highly effective for these kinds of situations.

It’s also worth noting that gpt-4 would probably perform better than gpt-3.5 on this problem type.

duncansmothers · May 23, 2023, 12:59pm

One thing you could try is in the {role:system} message you could instruct the model to give users complete and detailed explanations on why any selected answer is wrong.

diegopefm · May 23, 2023, 5:09pm

Thank you very much for your help @dliden, I’ll definitely give it a try and let you know.

Regarding ChatGPT-4 I tried to set the model to “gpt-4-0314” or just “gpt-4” but it always returns model not available. I tried an api call to to openai models endpoint with my access_key and can confirm “gpt-4” is not part of the json response, only 3.5

What can I do to try gpt-4 the same as I’m doing with 3.5, do I need a different access_key? Show I register in a different page? Can you give further help regarding this?

Thanks!

dliden · May 23, 2023, 5:51pm

Aah, for gpt-4 API access you’ll need to get yourself on the waitlist: GPT-4 API waitlist

smuzani · May 24, 2023, 3:12am

Oh, I meant put the passage that the question is referring to here. That example was an English exam, and the passage was just too long to paste in there.

Also 3.5 normally does poorly with math, but when I test with this example, it seems to be fine. I’d recommend something harder for GPT-4. There’s some data here: GPT-4

3.5 scores 70% on English. GPT-4 scores 85.5%. Math scores are similar too. Statistically, 3.5 is just below the bar qualifying it to be a SAT tutor, lol.

AI Response:

Your answer is incorrect because it only represents the sum of the number of messages sent by Adam and Andrew in one hour. However, the problem asks for the total number of messages sent by both Adam and Andrew over a period of time.

To find the total number of messages sent by Adam, you need to multiply the number of messages he sent per hour (x) by the number of hours he sent them (3). Therefore, Adam sent 3x messages in total.

Similarly, to find the total number of messages sent by Andrew, you need to multiply the number of messages he sent per hour (y) by the number of hours he sent them (5). Therefore, Andrew sent 5y messages in total.

To find the total number of messages sent by both Adam and Andrew, you need to add their individual totals. Therefore, the correct answer is:

Total = 3x + 5y

So, option B) 3x+5y is the correct answer.

Here’s my full code (in Node): https://github.com/smuzani/openai-samples/blob/main/node_examples/examhelp.js

One option is to try with a lower temperature to reduce hallucinations. Here I used 0.13. Another is to play with the system prompt a little. The system prompt matters little in 3.5, but makes quite a big difference in GPT-4.

diegopefm · May 24, 2023, 10:34am

Thank you very much again @smuzani, you are being very helpful and I appreciate it.
I have one more question:

In my tests app, some of the math tests include an image in certain questions. That images are stored in a public url and they’re obviously needed to answer the question.

Do you think it could be possible to instruct GPT to interprete (understand) the image in the url and use it to solve the question? I know you cannot upload images in GPT 3.5, but maybe it can “read” or interpret images from an url. How could I accomplish that?

I guess you’ll be able to upload images in version 4, but I’m still in the waiting list.

I’m attaching one example screenshot of my app so you have a clearer idea.

smuzani · May 24, 2023, 9:40pm

Oh, I don’t think these are doable yet. GPT-4 can read images; GPT-3.5 likely couldn’t even if it could open links. Most of us with GPT-4 access don’t have image access, though. It’s only for a select few in an even more limited beta.

diegopefm · May 25, 2023, 8:36am

Thanks again @smuzani, always helpful. One last question, is it possible to make gpt 3.5 (or at least gpt-4) to read text in english? (and in Chinese?)

smuzani · May 26, 2023, 3:24am

I’m not sure I understand the question. It’s reading text in English now? You can OCR if you mean reading text from an image, but that wouldn’t be through the OpenAI APIs yet. It handles multiple languages well, but I’m not familiar enough with Chinese to comment.

diegopefm · May 26, 2023, 7:44am

With “Reading” I mean text to speech. I’d like to make it read text in english or even better in chinese @smuzani

smuzani · May 27, 2023, 2:53am

That would probably be under TTS, also not related to OpenAI. I believe Android has some built-in libraries for that, it’s been around for a while Android - Text To Speech

Topic		Replies	Views
Get consistency in responses across different API calls to ChatGPT API	10	1334	July 20, 2024
Can GPT understand jqMath formulas? API	5	867	May 27, 2023
GPT powered learning solution API api	21	2267	December 19, 2023
GPT-4o Chat Completion with audio response API	6	6008	May 24, 2024
The AI contradicts itself API	2	1484	December 23, 2022

Making ChatGPT to show users better explanations about why the answer is wrong in my tests app

Related topics