GPT4 can not accurately understand simple logic

Here is my logic
if A ≥ B
and if A > C,B ≥ C,goes to condition ①
and if A > C,C > B,goes to condition ②
and if C ≥ A,goes to condition ③

Item
A:7
B:6
C:5

GPT4 tells me thatA(7) ≥ B(6),C(5)>B(6),so it goes to condition ②

While the correct answer is evidently ①, which confuses me so much.

I NEED HEEEEEELPPPPPPPPP

The detailed structure is as follows ( you can try it mutiple times in a json output format!)

______________(Prompt)

Your job is to give the final reply to the franchisee according to the information in [DATA] and follow rules of [RULES] to tell the franchisee which product goes to which condition ① or ② or ③.

[RULES]

if A ≥ B
and if A > C,B ≥ C,goes to condition ①
and if A > C,C > B,goes to condition ②
and if C ≥ A,goes to condition ③

[DATA]

item id:Fireworks 100g(5290)
A:8
B:6
C:10

item id:Soil 200g(5841)
A:10
B:5
C:10

item id:Bread 300g(2841)
A:6
B:5
C:5

Return the structure of json:
{

"data": [{
    "explain":"string",
    "itemnum":"string",
    "condition": "string",
}, {
    "explain":"string",
    "itemnum": "string",
    "condition": "string",
}]

}

  • json definition:
    • explain: lay out your reasoning process in detail
    • itemnum:item id
    • condition: ① or ② or ③

GPT4 : OUTPUT
The incorrect reply:

{
“data”: [
{
“condition”: “③”,
“explain”: “For Fireworks 100g, A is 8, B is 6, and C is 10. Since C is greater than A, it goes to condition ③.”,
“itemnum”: “5290”
},
{
“condition”: “①”,
“explain”: “For Soil 200g, A is 10, B is 5, and C is also 10. Since A is equal to C and A is greater than B, it goes to condition ①.”,
“itemnum”: “5841”
},
{
“condition”: “①”,
“explain”: “For Bread 300g, A is 6, B is 5, and C is 5. Since A is greater than C and B, and B is equal to or greater than C, it goes to condition ①.”,
“itemnum”: “2841”
}
]
}

The correct reply:

{
“data”: [
{
“condition”: “③”,
“explain”: “For Fireworks 100g, A (8) is not greater than or equal to C (10), so according to the rules, it goes to condition ③.”,
“itemnum”: “5290”
},
{
“condition”: “③”,
“explain”: “For Soil 200g, A (10) is equal to C (10), and since A is greater than or equal to B (5), it also goes to condition ③.”,
“itemnum”: “5841”
},
{
“condition”: “①”,
“explain”: “For Bread 300g, A (6) is greater than B (5) and A is greater than C (5), with B also being greater than or equal to C. Therefore, it fits the criteria for condition ①.”,
“itemnum”: “2841”
}
]
}

I ran it 10 times, it seems to me 80% it told me the second product should go to condition ① which i think should be condition ③ and im so confused right now

It appears to work fine for me.

The logic is in a more complex structure, besides I didn’t run it in a single time. Sometimes it’s correct, sometimes it just told me 5 is equal to 6 or larger than 6… which is very confusing

Mmh, I also tried both with GPT-4 and GPT-3.5 - each multiple times. Not a single issue. Not clear what the issue is that you are facing.

GPT4 seems to have more attention on a simple problem without too complex context. My prompt aims to make it extract information from the data i provided,and it should follow my logic and tell me to go to which condition ① ② ③, and then produce a detailed expression using those data. When my prompt is too long, it lost its focus on the reasoning. When you try to run it 20 time, there is always 2 or 3 times incorrect. My job is to make it perfect which seems quite imposible. I don‘t know have you ever encounter this kind of situation.

I posted the orignial structure of my prompt, may you see it?

I posted the orignial prompt, i would appreciate if you can give me some advice

Thanks. I am a bit short on time right now, so can’t run any extensive tests. But generally speaking, in these type of reasoning tasks it is not unusual that the error rate goes up as you increase the number of items subject to review. For each item the model requires sufficient “time to think” - that time gets reduced for every item you add. Try reducing it to two items and see whether that stabilizes the outcomes.

I also think that there’s some potential to enhance the prompt which may help to further lower the error rate. I’ll likely come back to this a bit later today.

I’ll be very grateful waiting for your another reply. It is such a common problem that i have to deal with every time i construct my prompt. We used the API verison of gpt4 and it seems to not have a memory like a chatbox, so if i input one item a time, i have to say everything again which wastes a lot of tokens for sure and I figured out some ways to reduce the posibility such as give extra command or reminder of being precise or setting up some good or bad examples. It sometimes could do, sometimes doesn’t work at all and may incur more mistakes which makes me so confused

Based on your rule, looks like you are not a programmer.
Your expression is not well-structured.

if (A >= B) then
    if (A > C) then
        if (B>= C) then
            return ①
        else
            return ②
    else
        return ③
else
    rerurn "undefined" // you missed this rule

Straight rule expression should be:
if A ≥ B and A > C and B ≥ C then ①
if A ≥ B and A > C and B < C then ②
if A ≥ B and A <= C then ③
if A < B then no reply
Such expression is more direct, concrete and clear.
When the logic is very simple, we can have same understanding of your expression, but when your rules goes more complex, even human can have different explanations of such expressions.

Try better expressions and see if AI reacts better?

1 Like

Wow, thank you for your comment. I will try that right now. Actually im a programmer hah.

I have testes several times, i changed my format like yours. but still problems arise. One output is that because A(4) is larger than B(2), but is smaller than C(3) so the answer is ③… This happens a lot and other mistakes as well. The answers are not consistent. May i should check further?

The basic problem you are encountering here is that:

  • Language inference is probability-based, not logic-based, from training on language;
  • The AI doesn’t have any unseen internal thinking or processing, it simply generates the next language token, one token at a time after reflecting on all text seen before;
  • The AI doesn’t consciously calculate or compare: one token just becomes more likely than another, and then tokens are still sampled with random probability based on how likely they are.

You can increase the chance that the AI can talk its way through some problem-solving, by laying out some step-by-step “thinking” that the AI must do, in first instructing to write some preliminary answers before coming to a final conclusion.

If you need the AI to do calculations, you’d probably want to offer it an external tool that can do math - or tool to run Python code that the AI writes.
If your task is simply a process that can be coded, you can instead use AI to approach the problem and recommend a permanent coding solution to reprocess the data that you’d be sending for logic-based operations.

The reason AI works is that humans are foolable: I could have written a thousand variations of this reply which would be understood and appear to come from an entity with understanding; however, ask me to “math,” and there’s only one correct answer.

1 Like

I could understand the fundamental reason why my prompt doesn’t always work. Thank you so much

Welcome to the community!

Please correct your spelling and grammar
([quote=“1372417922, post:12, topic:710346”]
I have testes several times…
[/quote]

because when I read your reply it can refer to another meaning which is not very satisfactorily ethical as it refers to Anatomy and probably Zoology as the producer of spermatozoa.
+++
Based on your language structures, you might want to learn from pondin6666’s answer (e.g., use if/else and return) with proper grammar, spelling and punctuations. How you express and write your prompts will determine the result a lot. So yeah, just 2 cents from me…

The above reply is an example of a less informed artificial intelligence. Instead of making a logical inference that the “D” key is next to the “S” key, and thus silently abstracting the original language intention, and continuing to make a useful contribution to the actual problem at hand, the processing of information in this affected mind became hung up on the misplacement of a single character, thus unreasonably thinking that a useful contribution to this thread was to solve someone’s problem of a misplaced word with their own misplaced words and language structure, instead of intuiting the actual solution needed.

Got anything to say about “rerurn”?


Let’s pursue the “simple logic” some more, by making prompting that can better be followed.

I take the logic statements above, and (using another AI), break down the task into individual instructions, each focusing on the language that step-by-step processing should generate. This chain-of-thought reasoning will let an AI better do deductions than if the only thing it was supposed to produce was the final answer:

Detailed Instructions for AI’s Text Production:

  1. Start your reasoning by examining A and B.
  • Output: “Starting the evaluation by comparing A and B.”
  • Decision Point: Based on the comparison, produce one of the following outputs:
    • If A >= B, Output: “Found that A is greater than or equal to B. To further my evaluation, I will next compare A and C.”
    • If A < B, Output: “Determined that A is less than B. Based on this, the final output is ‘undefined’.” End the evaluation.
  1. Proceed to compare A and C if the previous step was valid (i.e., A >= B).
  • Output: “Proceeding to the next step, which involves comparing A and C.”
  • Decision Point: Based on this comparison, produce one of the following outputs:
    • If A > C, Output: “After comparing A and C, it’s clear that A is greater. The next logical step is to compare B and C.”
    • If A <= C, Output: “Upon comparing A and C, I find that A is not greater than C. Therefore, the final output is ‘③’.” End the evaluation.
  1. If A > C, the final step involves comparing B and C.
  • Output: “For the final part of the evaluation, I need to compare B and C.”
  • Decision Point: Based on this last comparison, produce one of the following outputs:
    • If B >= C, Output: “Having compared B and C, I conclude that B is greater than or equal to C. Thus, the final output is ‘①’.”
    • If B < C, Output: “After the final comparison between B and C, it turns out B is less than C. Hence, the final output is ‘②’.”

By structuring the AI’s reasoning and output production in this manner, it ensures that each step’s output informs the next step, accommodating the AI’s inability to maintain an internal unseen state.

1 Like

Sorry but her grammar is not effective at all, it actually make AI more easy to lose focus. I tested so many times and i have to choose the original version.

What a great insight. Thank you. I need to study more about the basic logic of AI’s reasoning

Oh lord… I came back with a great result. You just gave me the best advice ever… My prompt is literally perfect now. My data contains 40 items and now i can run 10 of them every time. The result is incredible. I think i found a great way to craft my prompt and i came up with more ideas to make it more precise. Thank you for your enlightenment~~~

1 Like