Automating for mundane tasks

I’ve noticed, and getting increasingly frustrated with, automated mundane tasks not being performed consistently by ChatGPT. It’s like telling it to paint 1000 cans red: the first 10 cans are red, and then for some reason, it decides to paint the 11th yellow. I am encountering this problem with e.g. OCR (large number of similar files), or also coding tasks (o3 mini works a little bit better for this).

I am trying to optimise the prompts I write, and also to get ChatGPT to optimise the prompt so as to try and “force” it to avoid making mistakes. Since it painted the first 10 cans red, for sure it can do the task. I am at a point where I am not sure if the time I invest is worth it, and the monthly subscription fee feels a bit hefty to be beta-testing to be honest.

Is there some background reading, as to how to achieve consistency in results for tasks such as OCR and coding by better prompting?

1 Like

GPT is not really ideal for data extraction unfortunally.

This is not a bug but it is a consequence of how GPT fundamentally work.

For data extraction with gpt alone you are wasting your time.
You need to add other methods.

If you find that solution you gonna be rich.

1 Like

Did you guys ever try to use metaprompting and then sanity testing for consistency?

try that with an invoice from AWS :rofl::joy::sweat_smile:

I love to solve problems where people say it’s not possible. I did the same with limits of the custom gpts, where you posted. :slight_smile:

And also, for the gallery, creating quite photorealistic images still using DALL-E 3.

AND, yes, given: It’s not always as easy as we’d hope it is. Isn’t it? Sometimes it’s frustrating until it works.

Go for it. You’d be a hero.

1 Like

Can you elaborate somewhat more on each step what you’re trying to do?

Like:

step 1: this and that…
step 2: etc.
step 3: …

Expected outcome: <describe what you’d expect more precisely>

Unwanted outcome: <describe what’s happening so far, BUT that you don’t want>

Then it’s maybe easier to get a solution.


Maybe you then already find the solution on your own and can explain, if not, we can still help you further. :slight_smile:

Maybe you can try a little first. Assumptions were made for decades. You should read about them. You’ll find your proposal there as well…

Data extraction on envelopes for postal offices…

And template based solutions were among the most accurate a couple years ago…

1 Like

There’s at least 3 general methods if you’re using the OpenAI environment online:

  1. You could either create a GPT to do so
  2. Or you could leverage the API to do so.

Let’s stick with 1. for the moment, because on 2. we’d definitely need to elaborate more.

So you got the following approaches possible:

a) Use a prompt to do so that asks you for a pic and extracts the data.
b) You could explicitly tell in the prompt that it should use its Python Environment.
c) You could leverage any API (if we’re in a custom GPT for example)

I assume it’s extracting from pics or should those texts be extracted from Video?

Because then you could use EmguCV, OpenCV, or if you’re familiar with C# or any .net language you could also use AForge.net additionally.


And if you wanna drive this even further:

You could use

  1. Google Vision
  2. Easy OCR
  3. Train your own custom OCR Model
  4. Search for other OCR models

and then implement the rest you need yourself as well.

GOT… General OCR Theory is trending…
aws textract… analyze expense…

There is a lot more. tried them all… trained own models (with a huge bunch of hybrid edge detection preprocesing and whatnot in OpenCV)… made synthetic data… nah… it is not as easy as you think

you can come close with some techniques… really close… but I keep that for me.

Thanks for this! Very helpful indeed.

I’ve also looked at a reddit topic on the subject matter, which I am not allowed to link apparently.

It now makes sense why my first experiences with OCR were very positive, and I could not explain why all of a sudden I got bad results. OCR is another ca$h cow :slight_smile: I guess OpenAI is the same as social media: the business model is spending time on the platform. Lure 'em in, and trap 'em :money_mouth_face:

I would venture some more into building dedicated tools based on open source solutions. Training a model on OCR sounds interesting. Any training / education resources for this by any chance…? Thanks!!!

2 Likes

@Sjaak @hugebelts I saw this article recently where they claim Gemini Flash 2.0 works superbly well for OCR tasks.

OpenAI models will undoubtedly get much better at this. I remember my early GPT-4 Turbo Vision experience compared to now, and today it’s just so much better.

Right now i dont have examples to share publicly but for some of my case old GPT 4 models works better than latest 4omni.

1 Like