Vision API flips numbers on extracting text from image

I’m using Vision API to OCR some images, however on several occasions, GPT4 Vision returns incorrect results when dealing with numbers, it simply flips numbers, for example, if the image contains I encountered in many occasions and on several fields, for example on one field instead of extracting 578154181 it extracted 57851418. Note that images are high resolution and the letters and numbers are clearly visible. Anyway to solve this?

I wouldn’t rely on a single shot of GPT4V as OCR. It’s probably best to combine GPT4V with classical OCR to get the best results.

4 Likes

Problem is that it recognizes the right numbers when you try again, it just seems that it randomly makes errors when it wants to.

LLMs make “errors” all the time. I’m suggesting that it may be a good idea to use other tools that are more reliable to cross-validate or augment the capabilities of LLMs

1 Like