O1 Can't understand elbow connectors in charts

My team and I just found a major flaw in an otherwise perfect model, o1. It is super smart, but it cannot understand charts similar to the following:

There is a product code at the top with several empty boxes. When I ask it a question, like “What options are available for the rightmost empty box?” It cannot figure out the answer, it gives the wrong options every time.

For example, I can write

“Please give me all the options for the leftmost empty box” and it will correctly say “Number of valve positions, 2 and 3”

However, if I say “Please give me all the options for the rightmost empty box”, it will say “E1 E2 E3 E4 E3A E4A E5…” etc. when the answer should be “No designation, U”.

I hope OpenAI improves this for their next version of o1. This version is 2024-12-17

2 Likes

Have you tried GPT 4.5? (it can perform image analysis)

1 Like

I tried GPT-4.5-preview at low top_p, and the prompting didn’t leave anything to chance but instead develops the context needed to answer. The reasoning model likely can’t come up with a similar plan based on an image where the user description helps the vision understanding.

System: proprietary, general-purpose
User: {image}, detail:high
Attached is an image that describes the value of each character position in an identifier (similar to a VIN and positional values). Each letter position has a line that directs to its callout, where the line must be followed.
Produce a simple bullet point listing of the character position and a accurate matching transcription of its description.
From that documentation you’ve then produced, answer: Please give me all the options for the rightmost empty box.

I’m glad to help! I’ll first produce a clear bullet-point listing of each character position and its matching description, then answer your specific question regarding the rightmost empty box.

Bullet-point listing of character positions and descriptions:

  • 1st box: “RPE3-04” (4/2 and 4/3 directional control valve, solenoid operated)
  • 2nd box: Valve size (“04”)
  • 3rd box: Number of valve positions (“2” or “3”)
  • 4th box: Spool symbols (see table “Spool Symbols”)
  • 5th box: Rated supply voltage of solenoids:
    • G12: 12 V DC / 2.72 A
    • G24: 24 V DC / 1.29 A
    • G48: 48 V DC / 0.64 A
    • G110: 110 V DC / 0.30 A
    • W230: 230 V AC / 0.15 A / 50 (60) Hz
    • 01200: 12 V DC / 2.72 A (CSA approved)
    • 02400: 24 V DC / 1.29 A (CSA approved)
    • 12060: 120 V AC / 0.30 A / 60 Hz (CSA approved)
    • 23050: 230 V AC / 0.15 A / 50 Hz (CSA approved)
  • 6th box: Electrical connection of solenoids:
    • E1: EN 175301-803-A connector with integrated diode
    • E2: EN 175301-803-A connector without integrated diode
    • E3: AMP Junior Timer - axial direction
    • E4: AMP Junior Timer - radial direction
    • E8: EN 175301-803-A connector with integrated rectifier
    • E9: Loose conductors (two insulated wires)
    • E12A: Deutsch DT04-2P - axial direction (only for 12 V and 24 V DC)
  • 7th box: Manual override (standard rubber boot):
    • No designation: Standard rubber boot
    • N1: Manual override covered with retaining nut
    • N2: Manual override protected by rubber boot
    • N4: Hand screw
    • N5: Socket head screw
  • 8th box: Spool monitoring:
    • No designation: Without spool sensor
    • S1: Normally-open sensor
    • S4: Normally-closed sensor
  • 9th box: Surface treatment:
    • No designation: Zinc-coated (ZnCr-3), ISO 9227 (240 h)
    • A: Zinc-coated minimal, ISO 9227 (520 h)
    • B: Zinc-coated minimal, ISO 9227 (1000 h)
  • 10th box (rightmost empty box): CSA certification:
    • No designation: Without CSA certification
    • C: CSA certified

Answer to your specific question regarding the rightmost empty box:

The options for the rightmost empty box (CSA certification) are:

  • No designation (Without CSA certification)
  • C (CSA certified)

It got the positional transcript with callouts correct. It offers the prefilled definitions as “box”. The image size offered this forum is a single tile even at “high”.

It seems what it can’t understand, that a human would have a hard time understanding, is that in the last position, there are two options here: “nothing”, or “U”. Then read far across to their meaning.

It is the mapping of possible values to their description that is problematic. Key/value is widely-separated, and has an indistinguishable symbol thrown in. The value-description orientation switches on each side. The “rated supply voltage” also doesn’t have correct values.

You’ve definitely found a challenge.

1 Like