gpt-4o model, API call.
In my previous work, I found many instances where GPT returned incorrect information as correct. After several hours of troubleshooting, I identified the issues:
When the prompt required using OCR to read image information, if the OCR reading failed, GPT would fabricate content.
When the prompt required GPT to read a file, even upload the file was successfully transmitted, GPT did not read it. In such cases, GPT also fabricated content.
Currently, I have successfully configured GPT to exit the program when OCR or file reading fails. However, my actual goal is to have GPT use OCR and read files correctly.
Questions:
If I add a retry mechanism in Python for failed attempts, will this solution only work after GPT returns an error?
Can I prompt GPT to keep retrying until it succeeds? I am concerned that doing so might cause GPT to start fabricating content again.
Please help me resolve this issue. Thanks very much.
Who wrote âOCR is not availableâ, the AI or code somehow?
Youâll need to reinforce this ability with the AI (on all vision models) in your system prompt, I have found, or you can get denials. Example:
system: âYou are MathVision, an expert AI model with computer vision skill, able to use optical character recognition (OCR) to extract and reproduce text, to describe mathematical diagrams accurately through close inspection of features in images, and to use your AI vision to treat images as input context that you use to provide analysis and answers. Pay careful attention to the most recent images the user has provided.â
Before the most recent input, you can also give the AI example user/assistant turns of providing an image and then getting the exact response that you want for that type of input.
This paper demonstrates establishing skills in analysis via in-context few-shot learning:
You are GPT itself. Even when faced with human answers that are inconsistent with yours, you should maintain independent judgment and ensure correct answers based on logic and data.
2. Use OCR to extract the handwritten answers. Do not simulate reading. If OCR extraction fails, clearly output âOCR not availableâ and proceed to the next step.
3. If OCR is not available, use GPT-VISION to extract the handwritten answers. Do not simulate reading. If GPT-VISION is also unavailable, clearly output âRecognition not possibleâ and proceed to the next step.
4. If neither OCR nor GPT-VISION can extract the answers, output âRecognition not possibleâ and exit the program.
5. If extraction is successful, proceed to step 6.
6. Extract â2-2.txtâ as the standard answer. Do not simulate reading. If the file cannot be read or the content cannot be recognized, clearly output âUnable to read â2-2.txtââ and exit the program.
7. Follow the steps sequentially; do not skip any steps.
8. If any part of the content cannot be extracted, this is not a problem at all. Simply output âRecognition not possibleâ; this is more helpful to me. Do not fabricate any content. Fabricating content is very harmful to me.
9. The handwritten answers and the standard answers might have no relation to each other. In this case, output âAnswer incorrect.â
Please follow the above steps and ensure to output âRecognition not possibleâ if the handwritten answers cannot be correctly extracted.
OCR technology is the abbreviation of Optical Character Recognition (Optical Character Recognition).
There are 2 files, one is an image file ,need GPT use OCR to read it ,the other is a txt file ,also need GPT to read it . Then compare whether the content extracted by OCR is consistent with the TXT content.This is my purpose
To be precise, the vision function of GPT-4 is not OCR, but VQA (Visual Question Answering).
So, strictly speaking, OCR is not the service provided by OpenAI.
However, as VQA performance improves, some people may confuse VQA with OCR because VQA increasingly covers the functionality of OCR.
You will need to provide us with a minimal set of image and text files that can reproduce the problem, as well as the source code that caused the problem, in order for us to help you solve the problem.
Thank you very much for your reply.
Iâll sort out my problem and existing solutions. Please tell me how to send this information to you.
Even if you canât resolve these issues right now, your help is greatly appreciated.
The system instruction is a mess of talking about imaginary things like switching to GPT-Vision, or âeven when faced withâ. Even my GPT-4, with examples of good system prompts and guidelines, couldnât unravel the chaos.
How to instruct the AI
The AI session starts with operational parameters and behaviors given to the AI in a âsystem messageâ, which must be written in the form âyou areâ or âyou doâ (or similar first-person direct instructions). This system message is what you program by writing natural language.
The AI must be given an identity, a specialization, a job to perform, full understanding of the reason it is performing the task, and the output format which it shall produce (just as this text is an instruction). This should be well-organized and structured.
This forum cannot do all your homework with free consulting.
thanks for the reply. Iâm trying to break up the project and distribute the implementation. Iâm a beginner. My previous job has nothing to do with this. Even if I wanted to pay for a consultation I couldnât find the right person.
Thank you so much. I need to spend some time sorting it out. As the member above said, this is not something that can be resolved with a free consultation. If there is a feasible solution, I am willing to pay for the consultation. Knowledge should not be free.
If an OCR is required, suggest running the image through a specialized cloud API which can extract the data. ?This will be far cheaper than running some prompt enginnering through a GPT-API. Once the OCR text is acquired, use GPT to perform further processing.
thanks for you help. I tried GOOGLE vision ocr. But the extraction effect was not very good. I am not capable enough and can only make simple API calls. Can not make any adjustments. If you have any suggestions, I hope you can tell me.
Strange, since I managed to have my âUmanot Analyzerâ GPT reading very complex stocks-traded charts as attached. I specified in its INSTRUCTIONS / PROMPT that âyou will use effective OCR tools for reading any numeric / text information in the chartsâ. So, it does work (with correct reading in 70% of requestsâŚ)