Automating for mundane tasks

Sjaak · February 5, 2025, 2:33pm

I’ve noticed, and getting increasingly frustrated with, automated mundane tasks not being performed consistently by ChatGPT. It’s like telling it to paint 1000 cans red: the first 10 cans are red, and then for some reason, it decides to paint the 11th yellow. I am encountering this problem with e.g. OCR (large number of similar files), or also coding tasks (o3 mini works a little bit better for this).

I am trying to optimise the prompts I write, and also to get ChatGPT to optimise the prompt so as to try and “force” it to avoid making mistakes. Since it painted the first 10 cans red, for sure it can do the task. I am at a point where I am not sure if the time I invest is worth it, and the monthly subscription fee feels a bit hefty to be beta-testing to be honest.

Is there some background reading, as to how to achieve consistency in results for tasks such as OCR and coding by better prompting?

jochenschultz · February 5, 2025, 2:37pm

GPT is not really ideal for data extraction unfortunally.

This is not a bug but it is a consequence of how GPT fundamentally work.

jochenschultz · February 5, 2025, 2:38pm

For data extraction with gpt alone you are wasting your time.
You need to add other methods.

If you find that solution you gonna be rich.

hugebelts · February 5, 2025, 10:31pm

Did you guys ever try to use metaprompting and then sanity testing for consistency?

jochenschultz · February 5, 2025, 10:34pm

try that with an invoice from AWS

hugebelts · February 5, 2025, 10:39pm

I love to solve problems where people say it’s not possible. I did the same with limits of the custom gpts, where you posted.

And also, for the gallery, creating quite photorealistic images still using DALL-E 3.

AND, yes, given: It’s not always as easy as we’d hope it is. Isn’t it? Sometimes it’s frustrating until it works.

jochenschultz · February 5, 2025, 10:41pm

Go for it. You’d be a hero.

hugebelts · February 5, 2025, 10:41pm

Can you elaborate somewhat more on each step what you’re trying to do?

Like:

step 1: this and that…
step 2: etc.
step 3: …

Expected outcome: <describe what you’d expect more precisely>

Unwanted outcome: <describe what’s happening so far, BUT that you don’t want>

Then it’s maybe easier to get a solution.

Maybe you then already find the solution on your own and can explain, if not, we can still help you further.

jochenschultz · February 5, 2025, 10:43pm

Maybe you can try a little first. Assumptions were made for decades. You should read about them. You’ll find your proposal there as well…

jochenschultz · February 5, 2025, 10:47pm

Data extraction on envelopes for postal offices…

And template based solutions were among the most accurate a couple years ago…

hugebelts · February 5, 2025, 11:39pm

There’s at least 3 general methods if you’re using the OpenAI environment online:

You could either create a GPT to do so
Or you could leverage the API to do so.

Let’s stick with 1. for the moment, because on 2. we’d definitely need to elaborate more.

So you got the following approaches possible:

a) Use a prompt to do so that asks you for a pic and extracts the data.
b) You could explicitly tell in the prompt that it should use its Python Environment.
c) You could leverage any API (if we’re in a custom GPT for example)

I assume it’s extracting from pics or should those texts be extracted from Video?

Because then you could use EmguCV, OpenCV, or if you’re familiar with C# or any .net language you could also use AForge.net additionally.

And if you wanna drive this even further:

You could use

Google Vision
Easy OCR
Train your own custom OCR Model
Search for other OCR models

and then implement the rest you need yourself as well.

jochenschultz · February 6, 2025, 12:10am

GOT… General OCR Theory is trending…
aws textract… analyze expense…

There is a lot more. tried them all… trained own models (with a huge bunch of hybrid edge detection preprocesing and whatnot in OpenCV)… made synthetic data… nah… it is not as easy as you think

you can come close with some techniques… really close… but I keep that for me.

Sjaak · February 7, 2025, 12:09pm

Thanks for this! Very helpful indeed.

I’ve also looked at a reddit topic on the subject matter, which I am not allowed to link apparently.

It now makes sense why my first experiences with OCR were very positive, and I could not explain why all of a sudden I got bad results. OCR is another ca$h cow I guess OpenAI is the same as social media: the business model is spending time on the platform. Lure 'em in, and trap 'em

I would venture some more into building dedicated tools based on open source solutions. Training a model on OCR sounds interesting. Any training / education resources for this by any chance…? Thanks!!!

platypus · February 7, 2025, 12:17pm

@Sjaak @hugebelts I saw this article recently where they claim Gemini Flash 2.0 works superbly well for OCR tasks.

OpenAI models will undoubtedly get much better at this. I remember my early GPT-4 Turbo Vision experience compared to now, and today it’s just so much better.

Innovatix · February 10, 2025, 6:54pm

Right now i dont have examples to share publicly but for some of my case old GPT 4 models works better than latest 4omni.

hugebelts · March 19, 2025, 11:05pm

LOL. No, I’m not. And I won’t (ever) be, most likely.
Often times others are driven by ego. So, the best is to be very humble about it (from my perspective).

hugebelts · March 19, 2025, 11:08pm

Yes. I saw this, too. And others as well. So thank you for that as well @platypus

People would often benefit a lot, I guess by leveraging the tool they have.
BUT, I know searching always for a “better” tool is tempting as well.

Most of the time the task with the AI capabilities we can leverage is solvable with the AI at hand. Often it’s about RTM, and in this case we can even ask the model itself for that and find the solution.

You’ll rarely got stuck if you put some more work into it. AND, as a double benefit you get better skills for the next tool(s) coming up. If that makes sense.

Just showing my few not telling you’re wrong.

P.S.: Did you try it already meanwhile? Gemini Flash 2.0?

Topic		Replies	Views
How to solve the problem that GPT-API cannot read text using OCR? API	19	3455	July 10, 2024
Do you also relate or did you overcome this challenge? Prompting	5	1611	March 3, 2024
GPT cant analyze a small 52 cell excel file GPT builders gpt-4 , lost-user	12	418	November 18, 2024
Proven and reliable productivity use cases for GPT4 Community gpt-4	32	5331	June 20, 2023
Interested in learning prompting skills Prompting	10	3993	December 17, 2023

Automating for mundane tasks

Related topics