Identify meeting-room tables and annotate images

stabenfeldt · January 20, 2025, 1:05pm

Hi,

I’ve been working as a developer for many years, but I’m new to using tools like OpenAI.

I’m building a meeting room configurator. I want the user journey to be as follows:

A user uploads an image of a floor plan for the meeting room.

floor plan533×227 8.48 KB
The user uploads a PDF containing a specific speaker’s installation instructions.
I want the system I’m building to identify the meeting room table in the image and annotate where the ceiling speakers should be positioned, according to the information found in the PDF.

Does OpenAI have any tooling / LLMs that can help me with this? Or can I achieve that some other way?

Cheers,
Martin

Diet · January 20, 2025, 1:31pm

Welcome to the community!

There is a way to sort of kind of beat gpt-4-turbo and 4o into submission to get a somewhat workable result with a gaussian/stochastic descent, but it’s a gigantic pain in the rear.

It’s possible that o1 might be capable of this, but I don’t have access to that model.

If your plans all look the same (use the same icons), the good old tools like YOLO might be an option to localize what you’re looking at.

But if you want an LLM to locate arbitrary stuff,

between you and me

_{gemma/gemini’s the way to go right now}

Sorry openai!

stabenfeldt · January 20, 2025, 1:50pm

Hehe, thanks for the suggestion! I’ll have a deep dive into their docs and see what they can offer.

platypus · January 20, 2025, 3:17pm

Hi @stabenfeldt , warm welcome!

Someone has previously posted in the community a “grid” hack called GridGPT. Essentially you overlay a numbered grid over your image and the LLM can then parse the locality/context a lot better. Disclaimer: I haven’t tried this myself.

Topic		Replies	Views
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	8	6711	February 4, 2025
Best format to upload a construction plan for extraction of info Prompting gpt-4 , chatgpt , pdf	7	198	April 16, 2025
GPT-4o Model: Image Coordinate Recognition API gpt-4	31	4938	March 8, 2025
GPT-4-vision extraction of tables with branched rows/vertically-merged cells Prompting gpt-4-vision	9	2118	March 8, 2025
Identifying pixel positions of elements in an image API	3	76	March 17, 2025

Identify meeting-room tables and annotate images

Related topics