Worse OCR on rotated Text

So I’m using the image input for reading labels off of beverage cans and bottles. What I noticed is that it performs very well if the text is in the right rotation(way better than all OCR-tools I have tested so far). But if the text is rotated, like by 90 degrees, it makes a lot of mistakes. I mean it still manages to get a lot of it right, so it must be aware of the rotation. So why is it still worse?

While rotated versions of images are typically included in training sets, there will always be more data that contains text in the usual orientation humans take images in, which is right side up, so AI’s and people are better at text the right way up.

Makes sense. But it understands that the text is rotated if you ask it. Maybe if it comes to api an additional optional function call to rotate the image could be added. The result would be so much better.

1 Like

Well, think about what a human would do:

Give human object with text in random orientation
Human rotates object until text is upright

The issue comes in knowing how much to rotate the image and I’m not sure how you go about that, there are commercial OCR solutions for the packaging industry than can do this without LLM’s so maybe that would be a more effective way to do it.

I’ve been experimenting with easyOCR and it gives you the rectangles from the text-detection so getting the right angle is actually no problem. But easyOCR does often misreads, especially if the image isn’t preprocessed in the right way. Also on cylindrical labels the text often cant be captured with one image. Doing conventional ocr and stitching the results together can be quite complicated. I tried it with chatgpt with multiple images and it does a great job out of the box.

in that case then you could use easyOCR to put the box on, then either extract that rotation angle if you can or use opencv to measure it and then rotate the image, feed it to gpt-4v and you’re golden, could automate that when vision api is released.

1 Like