GPT struggles with drawings

I conducted an experiment using GPT-4o to interpret basic drawings that represent a traffic scenario involving two cars, labeled A and B. These drawings include arrows to indicate their directions, and the perspective is from an aerial view with basic traffic components like roads and intersections. This task is something that probably an 8yo could easily complete. However, GPT-4o’s performance was really bad; it was inconsistent and often confused about the directions in which the cars go. Even when provided with tips and feedback, the model mostly failed to understand the scenarios and hallucinate with nonsense.

I was wondering if others have same issues with drawings and/or vision. I want to understand more about why you think it struggles with drawings and if there are tips to get better performance. All help are greatly appreciated!

1 Like