Gpt-4-vision-preview handwriting transcription producing nonsense

When using low, the maximum image dimension is 512 pixels. A resize is automatically done.

That can mean an image 1920x1080 goes to 512x288 as input to the AI model. No way a page can be read.

Ask that same AI to use the Pillow image library (PIL) to make your own maximum size of an image side function of default 1024, and then at detail:high you’ll get 4x4 tiled image recognition (at significantly higher but still restrained cost)

There is also an alternate user message format that only accepts base64 and does not resize, so you have to ensure reasonable size yourself. It can see larger single-tile images, and the limitation is instead on how much context it can return (like text) before it hallucinates.

You can add @_j to a forum search and you might come across PIL powered functions for sending to AI in that message format…

2 Likes