Identify meeting-room tables and annotate images

Hi,

I’ve been working as a developer for many years, but I’m new to using tools like OpenAI.

I’m building a meeting room configurator. I want the user journey to be as follows:

  1. A user uploads an image of a floor plan for the meeting room.

  2. The user uploads a PDF containing a specific speaker’s installation instructions.

  3. I want the system I’m building to identify the meeting room table in the image and annotate where the ceiling speakers should be positioned, according to the information found in the PDF.

Does OpenAI have any tooling / LLMs that can help me with this? Or can I achieve that some other way?

Cheers,
Martin

1 Like

Welcome to the community!

There is a way to sort of kind of beat gpt-4-turbo and 4o into submission to get a somewhat workable result with a gaussian/stochastic descent, but it’s a gigantic pain in the rear.

It’s possible that o1 might be capable of this, but I don’t have access to that model.

If your plans all look the same (use the same icons), the good old tools like YOLO might be an option to localize what you’re looking at.

But if you want an LLM to locate arbitrary stuff,

between you and me

gemma/gemini’s the way to go right now

Sorry openai!

Hehe, thanks for the suggestion! I’ll have a deep dive into their docs and see what they can offer. :smile:

Hi @stabenfeldt , warm welcome!

Someone has previously posted in the community a “grid” hack called GridGPT. Essentially you overlay a numbered grid over your image and the LLM can then parse the locality/context a lot better. Disclaimer: I haven’t tried this myself.

3 Likes