Batch Processing Real World Architectural Structure

programmerrdai · January 30, 2025, 11:06am

Hi,
I want to know how batch processing usually occurs in actual applications, how is it implemented in the grand scheme of things, because I get the point of batch processing but I have decent amount of confusion of how it is integration with real world applications and how it effects delay’s and such.
If anyone can share their experience with a full application that utilize batch processing would be really helpful to get a full idea.
Thank you

plasmatoid · January 31, 2025, 10:56am

I am having some similar problems with bath processing of documents and data extraction. When I get a document, it needs to have a certain set of metadata.
The docs I have do not have any and when I look for extraction tools, the best ines I find are 50-80%. Any suggestions? How does openai ro it so well? Is it not known or I am missing something?

platypus · January 31, 2025, 12:45pm

Hi @programmerrdai !

To me it’s just another batch processing flow, like what we used to do with Spark or Airflow.

One architecture would be where you:

Create an Airflow job that runs once per week
Example of a job is parsing some documents from a specific source (e.g. a GCS or an S3 bucket), extracting specific information in structured output, and then writing that to BigQuery or Postgres or some other data store, where there will be further processing downstream (maybe a separate Airflow pipeline)
In the job you call Batch API and send the job, and then poll the status field. If completed you close down the Airflow pipeline, if some error occurs, log the reason for the error and shut down the pipeline with a failed state.

Topic		Replies	Views
How to build a large-scale real-time and batch processing system for OpenAI workloads? Community chatgpt	0	22	February 2, 2025
Any excellent Use Case example of Batch API? API	2	303	July 7, 2024
Working with Batch API while keeping sessions API	1	356	July 2, 2024
Issues with Rate Limiting and Batch Processing in OpenAI API Community api , batching	0	1819	November 11, 2023
Batching / parallel API calls for GPT-4-vision API gpt-4-vision	0	756	May 28, 2024

Batch Processing Real World Architectural Structure

Related topics