Use own data with images for queries

kabischmid · October 3, 2023, 6:04pm

I want to run queries on custom data that are science papers, around 1000 of them. They have images as well. What is the best approach here?

curt.kennedy · October 3, 2023, 6:16pm

I would start by extracting the text from the papers and exposing it for search.

It could be searched via vector correlation with embeddings, or based on keyword search, or a hybrid of embeddings and keywords.

Images are tough right now, but maybe with the Vision API around the corner, even the images can be processed, turned into text and exposed in the search as well.

kabischmid · October 3, 2023, 6:34pm

Thanks, thats what i thought.
Plenty of documentation on text. How does it work in training. Or when you ask it to give you an image of something? Is it an external system? Meta data on images?

kabischmid · October 3, 2023, 6:51pm

Maybe best to get llm to describe the image then store that text for image search?

curt.kennedy · October 3, 2023, 7:00pm

Yes precisely. That’s basically what the Vision component of the API would do. You give it an image, and ask for the text description.

FYI, text is the “lowest common denominator” for search. If you have an audio file, you transcribe it to text as well to expose it for search.

So you will have to start with text no matter what.

If you only worked in one domain, for example, only images, you could use one model good at embedding images for an image only search. But once you go “multi-modal”, it all goes text, mainly for ease of use and transportability.

It’s hard to compare vectors across different models. So you pick a channel (usually text) and translate everything over to that channel and make the comparison coming from a single model or a set of models all comparing the same data from a single channel.

You could conceivably use one model for text, one for vision, one for audio, etc. And then fuse all the results in some manner. This adds complexity, and since your discrimination function (comparator) is operating across different domains, it raises doubts since you are comparing across different domains with different detectors.

kabischmid · October 3, 2023, 7:14pm

So whats the status of vision api? Any time lines?

curt.kennedy · October 3, 2023, 7:18pm

It was just announced for ChatGPT. So my guess is later this year in the API at the earliest, or Q1/Q2 next year.

If this is a pressing project, you could see if there are other API’s or open source models suitable for the task of describing images and graphs of data.

But no matter what, you pipeline will mostly likely be:

{Thing} → {Text} <=== Search ↔ User

kabischmid · October 3, 2023, 7:25pm

Maybe google one might help. Not allowed to post link. Bit risky to put a lot of work and if api is coming

kabischmid · October 4, 2023, 12:36am

Wonder if any of these actually hndles images
ChatPDF.com - Chat with any PDF using the new ChatGPT API, not just ocr?

Topic		Replies	Views
DALL-E API to generate json data from image API api	12	4761	December 19, 2023
Use ChatGTP for semantic search or as computervision and keyword search API	8	933	October 24, 2023
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3931	December 6, 2023
Seeking Help with Image Classification Workflow Using ChatGPT Assistant API	1	1862	December 15, 2023
Knowledge Retrieval: support for PDF images Feedback knowledge-files	9	2168	October 28, 2024

Use own data with images for queries

Related topics