Prompting for OCR (question)

Potenzo · May 22, 2024, 5:56am

Naively constructing a prompt for OCR (in chatgpt, or probably also a gpt-4o API call), I might start by saying something like this:

“Please transliterate the text in this image. If you can’t read it clearly, try correcting the image before guessing the text. You might try to enhance the contrast on the image, or rotate it or apply skew or perspective correction to read it more easily. Make sure you’ve got the best read you can on the characters in the image before you try to correct your guess on the words through their semantic meaning”

However, maybe this kind of multistage technique is built in as standard, in which case such prompts will be futile and possibly even unhelpful. Does anyone know if this sort of prompt is likely to improve accuracy?

udm17 · May 22, 2024, 5:59am

Even if something like this is built in, reiterating it in the prompt will cause the model to focus heavily on these conditions, likely making sure that these instructions are followed.

It should help improve the performance of the model, if some the error were coming from due to this issue in the first place

Topic		Replies	Views
Best practices for prompt construction Prompting gpt-4 , prompt	2	963	August 23, 2024
Prompt upscaling image text prompt on chatgpt by gpt before going to dall-e? Prompting chatgpt , prompt , prompt-engineering , dalle3	0	318	December 13, 2024
What can help in effectively translating prompts: techniques and experiences ? Prompting chatgpt	2	737	March 9, 2024
How to solve the problem that GPT-API cannot read text using OCR? API	19	3465	July 10, 2024
Prompt for Image to JSON conversion Prompting gpt4 , image-reading	3	604	March 15, 2025

Prompting for OCR (question)

Related topics