How chatbot can return images and/or text from my own data in PDFs?

p.yankovainnovasys · July 25, 2023, 2:05pm

Hello,
I am trying to create AI chatbot with all my data in PDF. Those PDF file are full of images and text. I inserted in vector database, and when I query them, it shows me only the text from PDF, not the corresponding images or figure. How to embed the images from PDF, insert them in vector data base and the query them together with text?
I want the answer from chatbot to be both image and text .
Thanks a lot!

EricGT · July 25, 2023, 2:21pm

This sounds like a problem where you actually know the answer and can find it by just explaining the details to another developer and then you will see what you need.

Kind of like Rubber duck debugging but at a higher level.

This is often how I solve some of my harder problems but I also do it with pen and paper recording the details and then scratching out parts and changing others, etc.

Hint (Click triangle to expand)

If you are using only text to search the vector database then you need to add text related to the image into the vector database and when that is found include the image.

So where will you get text for the image?

Find associated text near the image. (Much easier said than done)
Check for meta data in the PDF for the image
Use OCR on the image to extract text
Use another AI to add tags for the image such as banana, Paris, etc.

p.yankovainnovasys · July 25, 2023, 2:24pm

which developer? I do not know how the embedding image from my own PDF files

EricGT · July 25, 2023, 2:26pm

Not trying to be funny.

You would talk with the duck, or imaginary duck.

Don’t look at the hint but it might help you answer this question. Try to answer it on your own even if it takes hours of research or a few days of thinking, you will be happier in the end if you do. I have been working with PDFs and such for years so know what a PITA they are to get anything more than just the text.

p.yankovainnovasys · July 25, 2023, 2:37pm

Thanks! Can you be more detailed ?

p.yankovainnovasys · July 25, 2023, 2:39pm

In my PDFs, I have text and figures for better understanding the context

EricGT · July 25, 2023, 2:52pm

Did you peek at the hint?

First you have to understand that PDFs are essentially what I consider a canned website for one topic but created in a proprietary manner. If you ever take apart a PDF they have a programming language PostScript, resources like text, images, fonts, streams, etc.

If you are extracting info from a PDF and do not understand this then I take it you are using an API which is hiding the details for you to get at just the text. If so please note the details of that.

Once you are able to access the image , places to look for associated text

Near the image which is visible in the rendered PDF
In metadata with the image
In the image itself which would require OCR to extract the text
Using another AI to recognize parts of the image returning a textual description, e.g. CLIP, BLIP-2, etc.

Make sense? If not I don’t plan to write code, maybe some else can point you to something that exist in a paper or on GitHub.

For a good overview of the PDF file structure see:

p.yankovainnovasys · July 25, 2023, 2:57pm

Thanks a lot for your help and time!

EricGT · July 25, 2023, 3:03pm

I can tell from that response you knew something you needed when you saw it and you probably knew enough to get to what you needed, you just did not know the right keywords. I know many think PDFs are like looking at raw text files or even more complicated like RDF but they are an entirely different game. Once you know their game, which is not easy, you are able to play ball.

EricGT · July 26, 2023, 11:39am

Ran across this paper, which is a PDF, today in my moring current events review.

“The Visual Language of Fabrics” by Valentin Deschaintre, Julia Guerrero-Viu, Diego Gutierrez, Tamy Boubekeur, Belen Masia (pdf)

Noting it here for two reasons:

The PDF has images without text in the image but has related text with the images, think test case for your code.
The paper notes

We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials. The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials.

If you don’t see the connection, think more abstract.

While not directly related to your question I found the related browser to be quite interesting; it showcases the effectiveness of the technology.

https://valentin.deschaintre.fr/text2fabric_browser_v1.html

Topic		Replies	Views
Use own data with images for queries API chatgpt , api	8	6183	October 4, 2023
Seeking Help with Image Classification Workflow Using ChatGPT Assistant API	1	1924	December 15, 2023
Converting PDF Files Text into Embeddings API	3	44725	October 13, 2023
Can API cut images (such as mathematical figures) from the PDFs? API gpt-4 , api , pdf	7	562	December 3, 2024
How to Populate a Vector Store with PDFs and Images for Searchability Using OpenAI's GPT-5 Model API gpt-5 , responses-api	2	658	November 20, 2025

How chatbot can return images and/or text from my own data in PDFs?

Related topics