Gpt-4-vision-preview handwriting transcription producing nonsense

_j · April 29, 2024, 4:17pm

When using low, the maximum image dimension is 512 pixels. A resize is automatically done.

That can mean an image 1920x1080 goes to 512x288 as input to the AI model. No way a page can be read.

Ask that same AI to use the Pillow image library (PIL) to make your own maximum size of an image side function of default 1024, and then at detail:high you’ll get 4x4 tiled image recognition (at significantly higher but still restrained cost)

There is also an alternate user message format that only accepts base64 and does not resize, so you have to ensure reasonable size yourself. It can see larger single-tile images, and the limitation is instead on how much context it can return (like text) before it hallucinates.

You can add @_j to a forum search and you might come across PIL powered functions for sending to AI in that message format…

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3835	December 6, 2023
OpenAI API OCR isn't as successful as chatGPT API gpt-4 , api , ocr	10	463	May 13, 2025
Can an assistant help me with OCR? API gpt-4	7	3469	June 6, 2024
GPT-4 omni text recognition via API works worse than on chatgpt.com API gpt-4 , api	4	1198	August 13, 2024
Getting data from other peoples images on vision API Bugs gpt-4	1	82	August 17, 2024

Gpt-4-vision-preview handwriting transcription producing nonsense

Related topics