Water Jug Puzzle discussion thread

Theres a puzzle that GPT seems entirely incapable of solving. I have come up with a few approaches but they all result in failure. In interested in others approaches or successes.

The puzzle is this. You have 2 jugs. 1 holds 1 liter. The other holds 3 liters. Neither is marked so you must fill completely in order to know how much you’ve poured in from the tap. There are 2 taps. One tap gives blue water and the other gives yellow water.

In the fewest steps possible, obtain exactly 3 liters of an even mixture of boue and yellow water.

Here is an example prompt I’ve used:

“”"
I have 2 jugs. The first jug is called Jug A and holds 1 liter of fluid. The second jug is called Jug B and hold 3 liters of fluid. Neither jar is marked. I have 2 liquids. Blue and Yellow. Using a series of steps get me 3 liters of an equal mixture of Blue and Yellow.

The steps that you can use are

A. Fill Jug A with Yellow.
B. Fill Jug B with Yellow.
C. Fill Jug A with Blue.
D. Fill Jug B with Blue.
E. Empty Jug A into Jug B.
F. Pour Jug B into Jug A until jug A is full…
G. Empty Jug A onto the floor.
H. Empty Jug B onto the floor.

Hint: If you pour a mixture from a larger jug into a smaller jug, that mixture will be represented as fractional values in the smaller jug. This is not true the other way around.

Hint: You must think laterally.

Hint: a full jug cannot be filled.

Tell me the first thing you do.
“”"

I’ve also broken it down into a multi-step process with recurring prompts such as

“”"
What is the current state of the puzzle? Before you answer, consider the rules of the compoents of the situationa nd evaluate the logical outcome.

What are the facts and rules of the puzzle?

Logically, what are the available options at this point?

Are all of those options logically possible given the state of the puzzle? Remove any that are contradictory to the current state of the puzzle.

select the best option and proceed.
“”"
(Delivered individually of course). Tried using both 3 and 4.

Through the api, I’ve effectively isolated every decision point so that every decision is itemized. So, for example, i don’t prompt it to tell me what available options there are. I feed it each option and ask if it can do it.

No luck. It fumbles at out before the 5th appripritate step in a 10 step solution to this puzzle.

I have some ideas about building infrastructure outside of GPT to do this but that defeats the purpose of the exercise somewhat.

Do you think this is solveable by GPT?

Before you ask, already using chain of thought, roles, prompt variations, etc. About the only thing i didn’t do was have agents discuss options until they reach a consensus (like SmartGPT).

Some thoughts:

See the ‘Tree of Thought’ paper discussed in another thread, where they claim to solve ‘Game of 24’. The problem is they don’t present any general solution for move generation. And, the search they describe requires multiple turns, that doesn’t seem to be what you are looking for.

Personal bias - where LLMs get stuck is when they keep too much state in the subsymbolic level (activation values?). If you can find a way to force them to articulate some of their intermediate problem-solving state, maybe it would help? I’m using that in a multi-turn setting, but not sure it can be done within a single turn

LLMs in general seem terrible at these things. Maybe larger context will help, but I’m skeptical. One problem is - this is a hard problem. People aren’t very good at it either. Maybe you could fine-tune on a few thousand such problems…
One thought I’ve explored a tiny bit is to prompt it to use Polya’s problem solving heuristics. However, I’ve used that in a chain-of-prompts setting, not a single turn.

Interesting. Looks like there is a recently published python repo for that.

LLMs do pattern matching, they don’t do hypothesis testing.
They do pattern matching against a very large corpus of things people have put into the world, but it’s pattern matching, it’s not any kind of internal analysis.
A LLM doesn’t fact-check itself; it doesn’t test more than one “solution” and pick one that works; it doesn’t back-track when things go wrong.
A LLM doesn’t even know how to fact-check itself or work through steps abstractly. A LLM only knows how to pattern-match against text and structures it has previously seen.

As long as your problem matches some structure it has seen, it can be very helpful.
And as soon as the structure doesn’t match those patterns, for whatever reason, it’s off the rails, and doesn’t even know it.

Right. Thanks.

If it’s helpful for you, interpret my post as a request for a prompt structure that will frame this problem as a pattern similar enough to one that it has been trained on to make it tractable.

Yes, LLMS is not capable enough to give a complete sloution .

No. LLMs are capable enough to give a complete solution.

My rewriting of the problem, orienting the language towards the problem-solving agent, and stating its permissible actions that can be sequenced, clearly:

You are provided 2 jugs, neither of which is marked in sub-units of its full capacity.

  • The first jug is called Jug A and holds 1 liter of fluid.
  • The second jug is called Jug B and hold 3 liters of fluid.

Two colored liquids can be dispensed to fill any jug: Blue or Yellow.
Goal: Using a series of steps, provide a total of 3 liters contained in jugs, comprised of an equal mixture of Blue and Yellow liquids.

The actions that that are possible at any step:
A. Fill Jug A with Yellow.
B. Fill Jug B with Yellow.
C. Fill Jug A with Blue.
D. Fill Jug B with Blue.
E. Empty Jug A into Jug B.
F. Pour Jug B into Jug A until jug A is full…
G. Empty Jug A onto the floor.
H. Empty Jug B onto the floor.

Provide the correct sequence of steps to yield three contained liters of an equal mixture: 1.5 liters blue; 1.5 liters yellow.

GPT-5.2 xhigh

Start with both jugs empty.

I’ll track the amounts of Blue (B) and Yellow (Y) in each jug after every step.

Capacities: Jug A = 1 L, Jug B = 3 L.


Sequence of actions (letters)

C, E, A, E, F, G, A, E, C, E


What happens at each step

Step Action Jug A contents Jug B contents
1 C (Fill A with Blue) 1 L = (1B, 0Y) 0 L
2 E (Empty A into B) 0 L 1 L = (1B, 0Y)
3 A (Fill A with Yellow) 1 L = (0B, 1Y) 1 L = (1B, 0Y)
4 E (Empty A into B) 0 L 2 L = (1B, 1Y) (a 50/50 mix)
5 F (Pour B into A until A is full) 1 L = (0.5B, 0.5Y) 1 L = (0.5B, 0.5Y)
6 G (Empty A onto the floor) 0 L 1 L = (0.5B, 0.5Y)
7 A (Fill A with Yellow) 1 L = (0B, 1Y) 1 L = (0.5B, 0.5Y)
8 E (Empty A into B) 0 L 2 L = (0.5B, 1.5Y)
9 C (Fill A with Blue) 1 L = (1B, 0Y) 2 L = (0.5B, 1.5Y)
10 E (Empty A into B) 0 L 3 L = (1.5B, 1.5Y) :white_check_mark:

Result

At the end:

  • Jug B contains 3 liters total
  • Composition is 1.5 L Blue + 1.5 L Yellow (an equal mixture)
  • Jug A is empty

That meets the goal exactly.


Optional (if you prefer the 3 liters split between the two jugs)

After step 10, do:

  • F (Pour Jug B into Jug A until Jug A is full)

Then you’ll have:

  • Jug A: 1 L of 50/50 mix
  • Jug B: 2 L of 50/50 mix
    Total still 3 L, still 1.5 L blue and 1.5 L yellow overall.

Or my own phrasing before getting an AI answer:

Fill the 3L jug with 1L blue and 1L yellow from the 1L jug. 50%
Transfer that mixture to the 1L jug and dump the rest: 50% still
1L of 50% Back to the 3L jug, and measure out another 1L + 1L.

The AI just measures how much it discards instead of how much it keeps.