Worse OCR on rotated Text

So I’m using the image input for reading labels off of beverage cans and bottles. What I noticed is that it performs very well if the text is in the right rotation(way better than all OCR-tools I have tested so far). But if the text is rotated, like by 90 degrees, it makes a lot of mistakes. I mean it still manages to get a lot of it right, so it must be aware of the rotation. So why is it still worse?

While rotated versions of images are typically included in training sets, there will always be more data that contains text in the usual orientation humans take images in, which is right side up, so AI’s and people are better at text the right way up.

1 Like

Makes sense. But it understands that the text is rotated if you ask it. Maybe if it comes to api an additional optional function call to rotate the image could be added. The result would be so much better.

1 Like

Well, think about what a human would do:

Give human object with text in random orientation
Human rotates object until text is upright

The issue comes in knowing how much to rotate the image and I’m not sure how you go about that, there are commercial OCR solutions for the packaging industry than can do this without LLM’s so maybe that would be a more effective way to do it.

I’ve been experimenting with easyOCR and it gives you the rectangles from the text-detection so getting the right angle is actually no problem. But easyOCR does often misreads, especially if the image isn’t preprocessed in the right way. Also on cylindrical labels the text often cant be captured with one image. Doing conventional ocr and stitching the results together can be quite complicated. I tried it with chatgpt with multiple images and it does a great job out of the box.

in that case then you could use easyOCR to put the box on, then either extract that rotation angle if you can or use opencv to measure it and then rotate the image, feed it to gpt-4v and you’re golden, could automate that when vision api is released.

1 Like

Does anybody found a good solution how to deal with rotated text?

I noticed that the lated model gpt4o also struggling if the text is rotated somehow on the image.
And at the same time model have a random result when I ask how I should rotate the image to place the text horizontally.

1 Like

You should be able to use/find a classifier to determine if the text is rotated, process it, and then feed it to GPT

Actually. I found a cool solution that I intend to implement myself.
EDIT
After reviewing everything and contemplating it I realized that it’s not a sufficient solution. It would only work if the orientation is rotated 90 degrees. It doesn’t indicate which way the rotation happened, and wouldn’t be able to catch upside down documents. So. Nope. Not gonna work. Sorry for suggesting it.

Ultimately having a first is_not_orientated Boolean would be beneficial as well for the model.

2 Likes

Maybe pay to ask Omni whether it’s upside down or at an angle then flip appropriately?

1 Like