Best OpenAI plan for document analysis with OCR and Power Automate?

Hi everyone,

I want to integrate ChatGPT into my application to analyze documents, and I need advice on which model to choose.

Context

  • I process scanned documents (OCR required).

  • I submit up to 2 batches of 30 documents per month, processed one by one

  • For each document, I need to:

  1. Extract its content via OCR
  2. Generate a summary
  3. Perform analyses and make recommendations based on the content
  • I use Microsoft Power Automate for automation.

Proposed Workflow
I plan to use a series of prompts to structure the processing:

  • Submission Prompt – Extract OCR content + generate a summary + identify the business sector.

  • Information Prompt – Search for relevant information online based on the business sector.

  • History Prompt – Retrieve previous analyses related to the business sector.
    Role Assignment Prompt – Assign the document to a specific team member based on their expertise.

  • Result Formatting Prompt – Structure the final output in JSON format.

Question
Which OpenAI plan would be best suited for my needs, considering my budget of $1,000 per year?
Among the available options (OpenAI o1, GPT-4o Mini, GPT-4o, etc.), I’m looking for the best balance between cost, performance, and the ability to handle OCR-extracted text and complex queries.

If anyone has experience with similar use cases, I’d love to hear your thoughts and suggestions!

Thanks :blush:

The best way is to tackle this at the source: a scanning solution that does the OCR and includes searchable text in the resulting PDF. My Fujitsu scanners can do this at 20 pages a minute with computer scanning software.

Then you can simply use code to extract that text, and send it off to AI.

ChatGPT is the web chatbot. You only get what OpenAI offers for consumers to chat with. It cannot be automated.

If you are thinking about API models, where you pay for the usage per call, there’s a solution (with file size limitations and practicalities) just introduced today. A new feature of attaching a PDF file to user message content, and then both the text extraction is performed for AI and also images of pages are given to the AI for vision, so it should have a decent understanding just from an image-based PDF.

https://platform.openai.com/docs/guides/pdf-files?api-mode=responses

1 Like

The mini models would be best suited for your needs, via API.

Especially o1 mini / o3 mini.

You want to avoid any models that are designed for “conversation” and have an extended capacity to reply in a conversational way.

Your proposed workflow seems good but it’s a little unclear if you are:

  • Extracting the OCR content before sending it to the LLM or wanting to rely on an LLM model that can do this? You’ll likely be much better off for a high-load automated workflow that does the OCR extraction first.

Then you send the extracted text to the LLM (along with your instructions) to generate the summary and identify “keys” like “business sector”.

So now you’ve got your first response.

  • Then you have to tie in to do web-searching. This can be done through API thanks to the new most recent releases or you can build your own/leverage existing web crawlers and hook them into your program.

This is where you would potentially want to switch to a different model like o1 or gpt4o, maybe. But if your query is still relatively data-based and you aren’t looking for heavy-duty assessment but rather analysis in a “data driven way”, then you could continue to stick with the “reasoning mini models” (o1/o3)

  • “history prompt” I don’t understand this aspect
  • “role assignment prompt” I don’t understand this aspect either - that’s an extremely high level of integration with a background program which is receiving responses from the LLM over API that would have to be linked to your other systems via SaaS (like paragon or whatever). Automating that is serious programming or at least systems analysis and understanding at a developer level of what would be required.

So the real question is - are you programming/developing/actually building a system? Or are you wanting to make use of an existing web-chat interface? If so, you could buy the “plus” plan and get some of it done via the various models available on the ChatGPT page. The “pro” plan on the ChatGPT page would break your bank (if I’m not mistaken it’s $200 a month?).

If you develop your own app or have someone do it for you, you could do most of what your trying to do except for the “role assignment prompt” (I’d recommend best handling that at the human level, or at least simply "providing the LLM with documentation about your team and it’s skill set, and then “asking it what team member to assign it to” but not expecting to develop some high level of cross-application integrations where something actually happens as a result of that (i.e. the human still has to send the email, make the phone call, actually “assign it” in whatever content management platform you are using, etc.).

If it was me, I’d recommend building your own app, and running o1-mini or o3-mini most of the time.

The cost there is roughly $1 per million tokens. Which is really incredible when you think about it, for $1000/year you end up being able to process 90% input and 10% output (from the API calls to the models) something like 800 million tokens a year, which is like 200 million words, or the rough equivalent of 2,000 medium length novels.

1 Like

Hi, and thanks for this detailed response!

I really appreciate the time you took to explain the different options.
I’m a programmer, and I’m the one looking to set up this system. I’m developing the application for my organization. As I mentioned, I don’t have experience with LLMs. Before going further, I’d like to gather feedback from experienced individuals to help me choose the model best suited to my constraints:

Annual Budget: $1000/year
Needs:

  • OCR / Data Extraction
  • PDF and Excel file processing
  • Content analysis and interpretation

Processing Methodology
I plan to start by extracting content from documents via OCR. Ideally, I would like to be able to attach a PDF file directly to my conversation, but I understand that this might increase costs. If there is a solution within my budget, I’d be open to it.

My processing follows a structured, multi-step process: each action depends on the result of the previous one. The results are usually in the form of tables with 5 columns and 5 to 10 rows. If there’s a more efficient way to submit these tables in my prompt, I’d also be interested.

Specific Processing Steps

  • Prompt History: This step involves injecting a table containing the list of already processed cases, categorized by sector.
  • Prompt Role Assignment: At this step, I integrate a table listing people with their positions and skills by sector. The goal is for ChatGPT to automatically suggest the most suitable person based on the relevant sector.

So, I’m looking for a powerful model capable of handling this logic while staying within my budget constraints.

Thanks again for your help!