Water Jug Puzzle discussion thread

Theres a puzzle that GPT seems entirely incapable of solving. I have come up with a few approaches but they all result in failure. In interested in others approaches or successes.

The puzzle is this. You have 2 jugs. 1 holds 1 liter. The other holds 3 liters. Neither is marked so you must fill completely in order to know how much you’ve poured in from the tap. There are 2 taps. One tap gives blue water and the other gives yellow water.

In the fewest steps possible, obtain exactly 3 liters of an even mixture of boue and yellow water.

Here is an example prompt I’ve used:

“”"
I have 2 jugs. The first jug is called Jug A and holds 1 liter of fluid. The second jug is called Jug B and hold 3 liters of fluid. Neither jar is marked. I have 2 liquids. Blue and Yellow. Using a series of steps get me 3 liters of an equal mixture of Blue and Yellow.

The steps that you can use are

A. Fill Jug A with Yellow.
B. Fill Jug B with Yellow.
C. Fill Jug A with Blue.
D. Fill Jug B with Blue.
E. Empty Jug A into Jug B.
F. Pour Jug B into Jug A until jug A is full…
G. Empty Jug A onto the floor.
H. Empty Jug B onto the floor.

Hint: If you pour a mixture from a larger jug into a smaller jug, that mixture will be represented as fractional values in the smaller jug. This is not true the other way around.

Hint: You must think laterally.

Hint: a full jug cannot be filled.

Tell me the first thing you do.
“”"

I’ve also broken it down into a multi-step process with recurring prompts such as

“”"
What is the current state of the puzzle? Before you answer, consider the rules of the compoents of the situationa nd evaluate the logical outcome.

What are the facts and rules of the puzzle?

Logically, what are the available options at this point?

Are all of those options logically possible given the state of the puzzle? Remove any that are contradictory to the current state of the puzzle.

select the best option and proceed.
“”"
(Delivered individually of course). Tried using both 3 and 4.

Through the api, I’ve effectively isolated every decision point so that every decision is itemized. So, for example, i don’t prompt it to tell me what available options there are. I feed it each option and ask if it can do it.

No luck. It fumbles at out before the 5th appripritate step in a 10 step solution to this puzzle.

I have some ideas about building infrastructure outside of GPT to do this but that defeats the purpose of the exercise somewhat.

Do you think this is solveable by GPT?

Before you ask, already using chain of thought, roles, prompt variations, etc. About the only thing i didn’t do was have agents discuss options until they reach a consensus (like SmartGPT).

1 Like

Some thoughts:

See the ‘Tree of Thought’ paper discussed in another thread, where they claim to solve ‘Game of 24’. The problem is they don’t present any general solution for move generation. And, the search they describe requires multiple turns, that doesn’t seem to be what you are looking for.

Personal bias - where LLMs get stuck is when they keep too much state in the subsymbolic level (activation values?). If you can find a way to force them to articulate some of their intermediate problem-solving state, maybe it would help? I’m using that in a multi-turn setting, but not sure it can be done within a single turn

LLMs in general seem terrible at these things. Maybe larger context will help, but I’m skeptical. One problem is - this is a hard problem. People aren’t very good at it either. Maybe you could fine-tune on a few thousand such problems…
One thought I’ve explored a tiny bit is to prompt it to use Polya’s problem solving heuristics. However, I’ve used that in a chain-of-prompts setting, not a single turn.

3 Likes

Interesting. Looks like there is a recently published python repo for that.

LLMs do pattern matching, they don’t do hypothesis testing.
They do pattern matching against a very large corpus of things people have put into the world, but it’s pattern matching, it’s not any kind of internal analysis.
A LLM doesn’t fact-check itself; it doesn’t test more than one “solution” and pick one that works; it doesn’t back-track when things go wrong.
A LLM doesn’t even know how to fact-check itself or work through steps abstractly. A LLM only knows how to pattern-match against text and structures it has previously seen.

As long as your problem matches some structure it has seen, it can be very helpful.
And as soon as the structure doesn’t match those patterns, for whatever reason, it’s off the rails, and doesn’t even know it.

Right. Thanks.

If it’s helpful for you, interpret my post as a request for a prompt structure that will frame this problem as a pattern similar enough to one that it has been trained on to make it tractable.