Propositionsl logic performance

shaneayers · May 24, 2023, 10:28am

I’ve been don’t experiments with simple prostitution logic and getting very mixy results, whether it be in chat or completion mode. Has anyone figured out any way to get consistent performance on this task type?

By propositional logic i mean
‘Fleas can jump 50 inches.
Fleas with ess than 4 legs cannot jump.
My flea has 3 legs.
Therefore,’

And up to 50% of the time it will get that wrong. Not literally using that example but something similar, and often with less premises. Rerunning the same prompt with the same temperature gives a 30% variation.

The only thing i haven’t done is allow it to talk through steps, since that makes parsing the output more difficult (or cost an extra api call).

Any ideas?

EricGT · May 24, 2023, 11:36am

You are not going to like this but you did ask.

Do not use an LLM for such evaluations, as you note it is not 100% accurate. If you need 100% accurate results then use a logic programming language like Prolog, or s(CASP) with Ciao.

EDIT

As we know this world is moving fast.

A paper that just came out and looks promising.

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

While I have not read the paper yet, I would not be surprised if it could not solve problems with 100% accuracy requiring recursion.

shaneayers · May 24, 2023, 12:04pm

You’re right. I don’t like picking up additional programming languages to finish projects.

With that said, I’d already reached the conclusion that LLMs are incapable of this task. I was just hoping for someone to prove me wrong.

anon22939549 · May 24, 2023, 2:41pm

Few-shot learning.

In the prompt give it two or three example syllogisms.

If that doesn’t work, do so again with chain-of-thought examples.

That should do it.

bruce.dambrosio · May 24, 2023, 5:11pm

Actually they can, but need help, see the thread on prompts as psuedo-code
Topics tagged prompts-as-code.
Especially note @stevenic and @qrdl comments near the (current) end of the thread about the work MS folk are doing - essentially writing a planner in prompts. If you follow this path, though, you will essentially end up writing your own theorem prover in prompt-language. And, of course, LLMs can wander. But, if you set temperature low and have it regularly check its work, it can be made to work.
Is it worth the trouble when prolog is right-at-hand? maybe.

qrdl · May 24, 2023, 11:43pm

Afaict, GPT4 is excellent at propositional logic.

Your example above has both a typo and uses “Fleas with ess than 4 legs cannot jump.”

What does ‘less’ mean? You haven’t formally defined it, assuming you meant ‘less’. Typos matter when it comes to logic.

Perhaps you meant critical thinking, which yes, GPT4 can fail at for sure.

shaneayers · May 25, 2023, 12:06am

My example was some nonsense i spontaneously typed. My real use case is a puzzle.

It has a very low success rate at determining whether an empty jug can be filled when told that the jug is empty and that empty is can be filled. It averages cost to 65%. That doesn’t sound like excellence to me. Can you educate me on how to do better. You were in that other thread described above, correct?

qrdl · May 25, 2023, 1:27am

Yes, GPT4 has extremely limited reasoning/inference capabilities. See here for more examples: GitHub - openai/evals: Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

However, for real propositional logic, I haven’t been able to stump it. Even on zero COT, it seems to be able to do it in the vast majority of cases.

If someone has an example of real prop logic problem that it fails on, love to see it.

bruce.dambrosio · May 25, 2023, 1:56am

Your example is logically inconsistent.
The first statement doesn’t say ‘Most’ fleas can jump, it simply says 'Fleas can jump.
***Disclaimer - the following is very sloppy logical form, actually predicate logic, not propositional. ***

I don’t know how to translate this into propositional logic other than to say something like Flea(x) and not Jump(x).
the next statement says Fleas with less than 4 legs cannot jump.
I don’t know how to translate that into propositional logic other than to say:
Flea(x) and LessThanFourLegs(x) and not Jump(x).

But from these two statements we can derive:
Flea(x) and not LessThanFourLegs(x)

But you then assert Flea(myFlea) and LessThanFourLegs(myFlea)
We have now reached an inconsistency. From that you can derive anything you want.

?

Topic		Replies	Views
Getting inconsistent results with same prompts Prompting	7	9362	January 10, 2022
Logical Proof Searching with concept representations for everything Prompting gpt-4 , gpt-4-turbo	1	820	January 12, 2024
Water Jug Puzzle discussion thread Prompting gpt-4 , chatgpt , api	4	1216	May 22, 2023
Impact of Pre-Structured Reasoning in LLM Prompts Prompting research , prompt-engineering , gpt	5	1771	January 21, 2024
GPT4 can not accurately understand simple logic Prompting gpt-4	19	1074	April 15, 2024

Propositionsl logic performance

Related topics