Using the Vision API: best practices

_j · September 17, 2024, 2:40pm

“detail”: “low”, as the documentation informs us, provides the AI with an image resized down to 512px on the longest dimension, and there are no “tiles” of repeated parts of the image also overlaid.

Resize your images to those dimensions and see if things are still legible.

With raw access to send an image of any size to be encoded to 85 tokens of “low”, one quickly sees that information theory holds. The amount of text you can get reproduced is less tokens than that before the hallucinations start.

Topic		Replies	Views
OCR using API for text extraction API api	9	10785	December 18, 2024
I am paying +$1 for a single request on analysing a 200kb image API gpt-4	5	552	June 1, 2024
Using gpt4o as OCR fills data with invented data API gpt-4 , gpt4o , ocr	10	427	December 20, 2024
Can GPT -vision models be accessed using API? API	15	1285	January 7, 2025
Best practice scanned PDF / What model to use? API chatgpt , plugin-development , api , gpt-4-vision	3	693	February 19, 2025

Using the Vision API: best practices

Related topics