Worse OCR on rotated Text

misch221 · October 31, 2023, 7:04am

So I’m using the image input for reading labels off of beverage cans and bottles. What I noticed is that it performs very well if the text is in the right rotation(way better than all OCR-tools I have tested so far). But if the text is rotated, like by 90 degrees, it makes a lot of mistakes. I mean it still manages to get a lot of it right, so it must be aware of the rotation. So why is it still worse?

Foxalabs · October 31, 2023, 7:07am

While rotated versions of images are typically included in training sets, there will always be more data that contains text in the usual orientation humans take images in, which is right side up, so AI’s and people are better at text the right way up.

misch221 · October 31, 2023, 7:12am

Makes sense. But it understands that the text is rotated if you ask it. Maybe if it comes to api an additional optional function call to rotate the image could be added. The result would be so much better.

Foxalabs · October 31, 2023, 7:18am

Well, think about what a human would do:

Give human object with text in random orientation
Human rotates object until text is upright

The issue comes in knowing how much to rotate the image and I’m not sure how you go about that, there are commercial OCR solutions for the packaging industry than can do this without LLM’s so maybe that would be a more effective way to do it.

misch221 · October 31, 2023, 7:29am

I’ve been experimenting with easyOCR and it gives you the rectangles from the text-detection so getting the right angle is actually no problem. But easyOCR does often misreads, especially if the image isn’t preprocessed in the right way. Also on cylindrical labels the text often cant be captured with one image. Doing conventional ocr and stitching the results together can be quite complicated. I tried it with chatgpt with multiple images and it does a great job out of the box.

Foxalabs · October 31, 2023, 8:23am

in that case then you could use easyOCR to put the box on, then either extract that rotation angle if you can or use opencv to measure it and then rotate the image, feed it to gpt-4v and you’re golden, could automate that when vision api is released.

roman.pierce · June 26, 2024, 2:13pm

Does anybody found a good solution how to deal with rotated text?

I noticed that the lated model gpt4o also struggling if the text is rotated somehow on the image.
And at the same time model have a random result when I ask how I should rotate the image to place the text horizontally.

RonaldGRuckus · June 26, 2024, 3:54pm

You should be able to use/find a classifier to determine if the text is rotated, process it, and then feed it to GPT

Actually. I found a cool solution that I intend to implement myself.
EDIT
After reviewing everything and contemplating it I realized that it’s not a sufficient solution. It would only work if the orientation is rotated 90 degrees. It doesn’t indicate which way the rotation happened, and wouldn’t be able to catch upside down documents. So. Nope. Not gonna work. Sorry for suggesting it.

Ultimately having a first is_not_orientated Boolean would be beneficial as well for the model.

PaulBellow · June 26, 2024, 8:18pm

Maybe pay to ask Omni whether it’s upside down or at an angle then flip appropriately?

Topic		Replies	Views
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	7	337	October 2, 2024
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	2995	December 6, 2023
Vision API flips numbers on extracting text from image Bugs	3	960	December 13, 2023
GPT-4 Vision Refuses to extract Info from Images? API	33	15300	June 5, 2024
Prompting for OCR (question) Prompting chatgpt , gpt-4o	1	1177	May 22, 2024

Worse OCR on rotated Text

Related Topics