How does the image OCR actually work in gpt-4?

steveny · May 29, 2024, 2:46pm

I asked gpt-4 to process my receipt image and return in a json format. It does work pretty accurately. Moreover, it provides the View analysis which includes Python code that uses pytesseract to generate the result. However, I’ve tried to run the provided code locally, and it got much worse accuracy.

This brings to my question on how chatgpt actually work? My assumption was that the model reads the prompt and then generates the python code which is then used to get answer. This also seems to match what’s described in this official doc

However, based on my local run, this assumption doesn’t seem to be true. This really confuses me a lot. Can anyone help confirm how this works?

anon10827405 · May 29, 2024, 2:50pm

When you use the vision model it is not actually running the mentioned python code. It’s running proprietary machine learning models that we cannot access without using OpenAI’s services.

It’s explanation of code is a hallucination. It does not generate code unless you ask for it to perform the OCR in Code Interpreter

enricobovo.ml95 · January 9, 2025, 3:19pm

Hey there, I have read the answer of @anon10827405 too and I think he’s right. I have tested some images both with GPT4-mini and Pytesseract on a local environment and the results are very much different. The task was digit recognition. While GPT retrieved the values correctly, Pytesseract was not even able to spot them

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	4093	December 6, 2023
How to Programmatically Extract Text from Images Using GPT-4 API gpt-4 , chatgpt , api , assistants-api	9	9447	October 14, 2024
Can an assistant help me with OCR? API gpt-4	7	4005	June 6, 2024
How to solve the problem that GPT-API cannot read text using OCR? API	19	4342	July 10, 2024
GPT4 OCR/Image Recognition API gpt-4	3	25816	December 18, 2023

How does the image OCR actually work in gpt-4?

Related topics