Today, the accuracy and quality of the GPT-4 vision model’s responses have significantly decreased, to the point where it incorrectly answers about 50 percent of the questions. As a result of this issue, we have lost 15 percent of our users today and have incurred a considerable financial loss. I don’t know why the power of this model has suddenly diminished so drastically, and its problem-solving ability has catastrophically declined.
Has anyone else had such an experience today? And what is the solution to this issue?
Do you have any examples, of failures? I’m aware that OpenAI has intentionally disabled text extraction for images, for most business use cases, and claiming that they’re doing it for privacy protections reasons, but it would be interesting to know what else they’ve decided to disable.
"Thank you. I’ve created a tool to help students with their questions in learning English. Users send pictures of their questions to the bot, and the bot responds and explains to them. For this, I am using the gpt-4-vision-preview model.
Previously, the rate of incorrect responses by the bot was about 10 to 20 percent, but since yesterday, this rate has reached 50 percent. This has caused many users to encounter problems in learning the language with this bot.
Are you just using the vision model for OCR? If so, there are about a million better and cheaper solutions for that.
I would suggest you look into some of them.
Tesseract OCR: An open-source OCR engine available under the Apache 2.0 license, Tesseract is capable of recognizing over 100 languages. Its documentation and source code can be found on GitHub, and it’s maintained by the tesseract-ocr community. You can find more details, including how to use it, installation instructions, and support for various languages, on the Tesseract User Manual and GitHub repository.
docTR by Mindee: Explore docTR on Mindee’s official website, where they provide details on their OCR solutions and APIs designed for document processing.
Amazon Textract: Amazon provides comprehensive documentation and API reference for Textract on their official AWS Textract page, where you can learn about its features, pricing, and how to start using the service.
Azure Document Intelligence: For information on Azure’s document processing capabilities, including OCR and form recognizer services, visit the Azure AI Documentation.
Google Cloud Vision: Google offers detailed guides, client libraries, and API references for Cloud Vision on the Google Cloud Vision API page, which covers its OCR capabilities among other image analysis features.
And from ChatGPT a list of some more open source OCR solutions,
OCRopus: A suite of OCR-related tools developed by Google that expands upon the capabilities of Tesseract. Offers advanced functionalities for layout analysis and text recognition. It’s customizable but has a steep learning curve. Discover OCRopus.
GOCR: An OCR engine that stands out for its simplicity and ease of use, supporting several languages and operating platforms. Its accuracy may be lower compared to more advanced engines. Explore GOCR.
CuneiForm: Known for its precision and support for multiple languages, CuneiForm provides flexibility in input sources and output formats, though its user interface may not be as intuitive. Find out more about CuneiForm.
EasyOCR: A Python package designed for OCR tasks, utilizing a CUDA-capable GPU for text detection and recognition speed acceleration. It’s user-friendly and versatile in text handling. Check out EasyOCR.
Ocrad: Focuses on simplicity and speed, suitable for basic OCR tasks with an emphasis on recognizing printed text. It may not offer advanced features such as layout analysis. Learn about Ocrad.
GImageReader: Provides a user-friendly interface and supports multiple languages, suitable for basic OCR tasks, but its accuracy and performance may vary. Explore GImageReader.
Kraken: Developed to address the limitations of Ocropus, relying on the CLSTM neural network library to gain experience from previous data, assisting in training new models. Discover Kraken.
A9T9: A simple and free OCR software for Windows by Microsoft, noted for its ease of use and customizability, offering a spyware-free system. Check out A9T9.
Ok, that’s interesting. If the type and complexity of the average image has not changed, but the quality of the output has gone down, that might mean OpenAI has changed something. It’s all closed source, so we have no way of knowing really what they’re doing under the covers.
EDIT: Although I do remember a couple of months ago ChatGPT was “getting lazy” people were claiming (from which it eventually supposedly “recovered”). I never understood how that could happen unless some tweaking was being done internally that was causing it. AFAIK a model is a “static” thing and it’s impossible for it to “become lazy”. Maybe someone can explain exactly what happened there, or if it’s still a mystery.