I asked gpt-4 to process my receipt image and return in a json format. It does work pretty accurately. Moreover, it provides the View analysis which includes Python code that uses pytesseract to generate the result. However, I’ve tried to run the provided code locally, and it got much worse accuracy.
This brings to my question on how chatgpt actually work? My assumption was that the model reads the prompt and then generates the python code which is then used to get answer. This also seems to match what’s described in this official doc
However, based on my local run, this assumption doesn’t seem to be true. This really confuses me a lot. Can anyone help confirm how this works?