The mini models would be best suited for your needs, via API.
Especially o1 mini / o3 mini.
You want to avoid any models that are designed for “conversation” and have an extended capacity to reply in a conversational way.
Your proposed workflow seems good but it’s a little unclear if you are:
- Extracting the OCR content before sending it to the LLM or wanting to rely on an LLM model that can do this? You’ll likely be much better off for a high-load automated workflow that does the OCR extraction first.
Then you send the extracted text to the LLM (along with your instructions) to generate the summary and identify “keys” like “business sector”.
So now you’ve got your first response.
- Then you have to tie in to do web-searching. This can be done through API thanks to the new most recent releases or you can build your own/leverage existing web crawlers and hook them into your program.
This is where you would potentially want to switch to a different model like o1 or gpt4o, maybe. But if your query is still relatively data-based and you aren’t looking for heavy-duty assessment but rather analysis in a “data driven way”, then you could continue to stick with the “reasoning mini models” (o1/o3)
- “history prompt” I don’t understand this aspect
- “role assignment prompt” I don’t understand this aspect either - that’s an extremely high level of integration with a background program which is receiving responses from the LLM over API that would have to be linked to your other systems via SaaS (like paragon or whatever). Automating that is serious programming or at least systems analysis and understanding at a developer level of what would be required.
So the real question is - are you programming/developing/actually building a system? Or are you wanting to make use of an existing web-chat interface? If so, you could buy the “plus” plan and get some of it done via the various models available on the ChatGPT page. The “pro” plan on the ChatGPT page would break your bank (if I’m not mistaken it’s $200 a month?).
If you develop your own app or have someone do it for you, you could do most of what your trying to do except for the “role assignment prompt” (I’d recommend best handling that at the human level, or at least simply "providing the LLM with documentation about your team and it’s skill set, and then “asking it what team member to assign it to” but not expecting to develop some high level of cross-application integrations where something actually happens as a result of that (i.e. the human still has to send the email, make the phone call, actually “assign it” in whatever content management platform you are using, etc.).
If it was me, I’d recommend building your own app, and running o1-mini or o3-mini most of the time.
The cost there is roughly $1 per million tokens. Which is really incredible when you think about it, for $1000/year you end up being able to process 90% input and 10% output (from the API calls to the models) something like 800 million tokens a year, which is like 200 million words, or the rough equivalent of 2,000 medium length novels.