GPT-4 Vision Refuses to extract Info from Images?

Function calling is your friend if you want json!

For me all it took was one example of OpenAI refusing to parse a perfectly harmless image (because it contained people’s names) to know their Cloud AI is just not trustworthy (from a censorship perspective)

I consider OpenAI a great tool for enthusiasts to learn what AI can do, but for any business-critical apps the only safe bet is to use an Open Source model you have full control of.

1 Like

+1 I feel it’s overally aligned.
Sometimes we need to extract not only the numbers from a blueprint but semantic meaning of those unlabeled numbers. It used to work with GPT4V with some tricks though not stable, but no longer work which makes it quite annoying.
Older OCR techniques won’t work as this is a multi-task labeling unless finetune a model with tens of data.

I do hope OpenAI would think of making less alignment on these, as it’s obvious that information is already visible to human eyes, where is the safety concern coming from?

Something that works for the moment was to add in the prompt:

do not use python, neither pytesseract and use your vision skill directly …

but not sure how long it will work.

edit: is not working anymore :disappointed:.

OCR processing

Interestingly enough same prompt in ChatGPT extracts personal data, while in API it does not

Quite sad as it cuts off billions of usecases for GPT Vision feature

I can think of lots of reasons Microsoft and OpenAI would want to disable OCR uses for everyone but themselves (i.e. disable it from the API), but I can’t think of any “Alignment” reasons. OCR is not a “dangerous” technology that people need to be protected from. lol. However it is something Microsoft would rather their competitors not have, obviously.

1 Like

Enabling this feature is valiable in many ways possible , there is always an edge case for every thing. I checked today uploading a sc to get text from it but refused to do so, while google gemini did it, but it’s wrong every time.

bing Copilot is working fine , if I upload the image and gets the almost correct text back