Prompting for OCR (question)

Naively constructing a prompt for OCR (in chatgpt, or probably also a gpt-4o API call), I might start by saying something like this:

“Please transliterate the text in this image. If you can’t read it clearly, try correcting the image before guessing the text. You might try to enhance the contrast on the image, or rotate it or apply skew or perspective correction to read it more easily. Make sure you’ve got the best read you can on the characters in the image before you try to correct your guess on the words through their semantic meaning”

However, maybe this kind of multistage technique is built in as standard, in which case such prompts will be futile and possibly even unhelpful. Does anyone know if this sort of prompt is likely to improve accuracy?

Even if something like this is built in, reiterating it in the prompt will cause the model to focus heavily on these conditions, likely making sure that these instructions are followed.

It should help improve the performance of the model, if some the error were coming from due to this issue in the first place