Getting data from other peoples images on vision API

I am making a request to completions api with an image of a website. I have tried via base64 encoding and passing a URL.

My prompt asks it to transcribe all the text on the screenshot and output in plain text but neatly formatted.

I ran it on a few pages and found some pages didn’t match the output content. I took one of these images as example and each time I make the http request the transcribed text coming back is for a completely different topic/subject. I cant find any of the words in the image I have sent.

I used chatGPT to see what it would do with the same image and prompt and it works fine, outputting all the text neatly.

I’ve tried both 4o models and it happens with both.

Sometimes it output text similar to the website screenshot but not the same. Sometimes it says “I cannot do that” and sometimes it output totally different text. Like I will give a screenshot of some text about accountancy and the output text will be about drug rehabs.

It’s totally insane and makes no sense. I am assuming it’s either a bug or some anti-bot data poisoning attempt.

Can anyone clarify why I get this behaviour?

ps. if anyone is wondering why i screenshot websites it’s because I am converting users sites to vector db and pulling text by traditional means doesn’t always work with thing like pricing tables etc… but I have found chatGPT can transcibe images of the site perfectly without losing data. It’s just the API that seems to not want to work.

I think figured it out. Open AI resizes the image. Varying website heights seem to be the reason why sometimes it works. I guess the dimensions throw it off and whatever is fed to the model sometimes it completely illegible and it hallucinates some crazy stuff.

The docs on image resizing say long side should not be more than 2000px and website screenshots are way more than that.

The funny thing is that images always work perfectly in chatGPT so they must have some additional image chopping process that isn’t automatic on the API.

Funnily enough it seems if I convert PNG to JPG then the API will be able too handle it and output the text.

At least I know what it is now and can work out a fix.

2 Likes