Hi,
I have a large collection of images (60k+ paintings etc) where I want motiv to be searchable. Image metadata etc is stored in a Mongdb local db. (Accessible via FastAPI)
I have, as I see it, two appoarches:
-
Create a description with ChatGTP ‘computer vision’ (When aviable in API), store that info in Mongdb, and do a keyword ‘containing’ search, or
-
Create descriptions as option 1, but also finetune ChatGTP with the 60k+ jsons containing desciption for a more natural/semantic search.
From user perspective, it sounds like option 2 is the better, but is it doable? And if yes, how?
Thanks
Hi and welcome to the Developer Forum!
I’m not sure what it is you have, do you have 2 sets of 60k images? One with associated human made description and one without? Or do you have 60k images with json descriptions that you wish to generate more descriptions for?
Where is the extra 60k of json descriptions coming from?
but also finetune ChatGTP with the 60k+ jsons containing desciption
Thanks!
One set of 60k images where I plan to use ChatGTP imagedescriptions (When it’s deployed to their API) .
Did that clearify?
Ok, so I don’t understand where you would get the fine tuning description jsons from?
Option 1 will work. Option 2 seems unachievable without first doing option 1,m or having lots of people do the same thing.
Thanks you for answering my questions.
I will have chatgtp to create the descriptions for me. (When they open up for uploading images thru their api).
Also I will add ±50 new images every day to my database. And create descriptions for those as well with ChatGTP.
Can you say something about my options?
For putting it in context: I work in a museum, where we publish digitalt photos of our collection. (paintings, sketches, things like chairs, etc). I want out online audience to be able to search thru our collection using semantics, for example: “Find a painting of a skeletal figure standing on a bridge with hollow eyes and mouth agape. Colors should be in red, orange with some blue”

I think it’s an exciting idea, and you should absolutely give this a try, one of the problems is, this is all new! Nobody has done this before, so there are no text books to check or historical works to reference. I think it should work, but the details and the pitfalls are yet to be discovered, people will be using what you find out as a reference for their work.
For a description like the last one “Find a painting of a skeletal figure standing on a bridge with hollow eyes and mouth agape. Colors should be in red, orange with some blue”, I guess that you know of what image are you talking about, I guess…
So, have you tried to get the picture of it, push it to GPT-V on the app o website, and prompt it to describe it as you expect? What does it gives to you? You think it could be really useful?
And by the way, as have seen the Dall-e generation prices, you could extrapolate it to vision future ones, so my man, to digest 60k pictures, and 50 daily ones, for sure it would rent to hire someone.
EDIT:
I’ve to say that once I read your description I’ve thought with “El Grito” from Munch. And look how ChatGPT on 3.5 performs, which is incredible per se:
Your Question: “If someone gave you this description of a painting, what painting do you think we would be talking about? ‘Find a painting of a skeletal figure standing on a bridge with hollow eyes and mouth agape. Colors should be in red, orange with some blue.’”
My Response: “The given description appears to correspond to the famous artwork titled ‘The Scream’ (in Norwegian, ‘Skrik’), painted by the Norwegian artist Edvard Munch. This iconic painting depicts a skeletal figure on a bridge with hollow eyes and an open mouth, using colors primarily in shades of red, orange, and some touches of blue. ‘The Scream’ is one of the most recognized and iconic works in the history of art.”
Hi, thanks
The query description in this example might be to spesific, I agree. But let’s say I want to have a desktop background that matches my office setup, (lame example), I could ask “Show me paintings that has orange, red and blues in it. Also it should be a famous artwork”.
I’ve tried with a couple of our paintings (btw, we’re the museum that has the “El Grito” in our collection), and respons from ChatGTP 4V ( GPT-4V(ision) system card (openai.com) , with right prompting gives me results that I see is useful for our online visitors.
Storing that decription in a MongoDB is easy, but how can I use semantics to search thru the database?
And costs might be a dealbreaker.
Hehe,
thank you for your inputs!
I guess it’s new yes. I’ll do a paper of my findings.
What I havn’t really got a clear path to is how to I give the online visitors the posability to do a semantic search thru the database with all the descriptions?
Maybe MondoDB Atlas with semantics is the way…?