I am making a request to completions api with an image of a website. I have tried via base64 encoding and passing a URL.
My prompt asks it to transcribe all the text on the screenshot and output in plain text but neatly formatted.
I ran it on a few pages and found some pages didn’t match the output content. I took one of these images as example and each time I make the http request the transcribed text coming back is for a completely different topic/subject. I cant find any of the words in the image I have sent.
I used chatGPT to see what it would do with the same image and prompt and it works fine, outputting all the text neatly.
I’ve tried both 4o models and it happens with both.
Sometimes it output text similar to the website screenshot but not the same. Sometimes it says “I cannot do that” and sometimes it output totally different text. Like I will give a screenshot of some text about accountancy and the output text will be about drug rehabs.
It’s totally insane and makes no sense. I am assuming it’s either a bug or some anti-bot data poisoning attempt.
Can anyone clarify why I get this behaviour?
ps. if anyone is wondering why i screenshot websites it’s because I am converting users sites to vector db and pulling text by traditional means doesn’t always work with thing like pricing tables etc… but I have found chatGPT can transcibe images of the site perfectly without losing data. It’s just the API that seems to not want to work.