GPT-4o Model: Image Coordinate Recognition

Diet · August 15, 2024, 12:05pm

It can*

_{*Sort of}

an old screenshot from the lab:

Your job is to find the location of ‘the black talk to claude button’.

^{((gpt-4o-2024-05-13))}

But it’s pretty tricky, and finnicky to boot.

Some believe that once we have proper embodied models, this stuff will become easier. You can perhaps think of it as hand-eye coordination. Babies really struggle with it, and certain neurological conditions make it more difficult for adults. And the current models are definitely missing certain human faculties.

I’m not gonna shill my own products here - I do recommend @anon10827405’s advice and go with tesseract or similar, if your use-case allows.

Topic		Replies	Views
Why can't I read images via Completions API? API gpt-4 , completions	12	210	May 16, 2025
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	9	7726	May 19, 2025
Best way to interact with PDF 2025 API chatgpt , api , pdf , assistants-api	47	5138	May 18, 2025
Reading videos with GPT4V API gpt-4	39	31808	December 1, 2024
Search long pdf for specific table - possibly need fine tuning model API gpt-4 , fine-tuning , api	10	3062	March 29, 2024

GPT-4o Model: Image Coordinate Recognition

Related topics