I’ve noticed, and getting increasingly frustrated with, automated mundane tasks not being performed consistently by ChatGPT. It’s like telling it to paint 1000 cans red: the first 10 cans are red, and then for some reason, it decides to paint the 11th yellow. I am encountering this problem with e.g. OCR (large number of similar files), or also coding tasks (o3 mini works a little bit better for this).
I am trying to optimise the prompts I write, and also to get ChatGPT to optimise the prompt so as to try and “force” it to avoid making mistakes. Since it painted the first 10 cans red, for sure it can do the task. I am at a point where I am not sure if the time I invest is worth it, and the monthly subscription fee feels a bit hefty to be beta-testing to be honest.
Is there some background reading, as to how to achieve consistency in results for tasks such as OCR and coding by better prompting?
There’s at least 3 general methods if you’re using the OpenAI environment online:
You could either create a GPT to do so
Or you could leverage the API to do so.
Let’s stick with 1. for the moment, because on 2. we’d definitely need to elaborate more.
So you got the following approaches possible:
a) Use a prompt to do so that asks you for a pic and extracts the data.
b) You could explicitly tell in the prompt that it should use its Python Environment.
c) You could leverage any API (if we’re in a custom GPT for example)
I assume it’s extracting from pics or should those texts be extracted from Video?
Because then you could use EmguCV, OpenCV, or if you’re familiar with C# or any .net language you could also use AForge.net additionally.
And if you wanna drive this even further:
You could use
Google Vision
Easy OCR
Train your own custom OCR Model
Search for other OCR models
and then implement the rest you need yourself as well.
GOT… General OCR Theory is trending…
aws textract… analyze expense…
There is a lot more. tried them all… trained own models (with a huge bunch of hybrid edge detection preprocesing and whatnot in OpenCV)… made synthetic data… nah… it is not as easy as you think
you can come close with some techniques… really close… but I keep that for me.
I’ve also looked at a reddit topic on the subject matter, which I am not allowed to link apparently.
It now makes sense why my first experiences with OCR were very positive, and I could not explain why all of a sudden I got bad results. OCR is another ca$h cow I guess OpenAI is the same as social media: the business model is spending time on the platform. Lure 'em in, and trap 'em
I would venture some more into building dedicated tools based on open source solutions. Training a model on OCR sounds interesting. Any training / education resources for this by any chance…? Thanks!!!
@Sjaak@hugebelts I saw this article recently where they claim Gemini Flash 2.0 works superbly well for OCR tasks.
OpenAI models will undoubtedly get much better at this. I remember my early GPT-4 Turbo Vision experience compared to now, and today it’s just so much better.