Naively constructing a prompt for OCR (in chatgpt, or probably also a gpt-4o API call), I might start by saying something like this:
“Please transliterate the text in this image. If you can’t read it clearly, try correcting the image before guessing the text. You might try to enhance the contrast on the image, or rotate it or apply skew or perspective correction to read it more easily. Make sure you’ve got the best read you can on the characters in the image before you try to correct your guess on the words through their semantic meaning”
However, maybe this kind of multistage technique is built in as standard, in which case such prompts will be futile and possibly even unhelpful. Does anyone know if this sort of prompt is likely to improve accuracy?