Batch Processing Real World Architectural Structure

Hi @programmerrdai !

To me it’s just another batch processing flow, like what we used to do with Spark or Airflow.

One architecture would be where you:

  • Create an Airflow job that runs once per week
  • Example of a job is parsing some documents from a specific source (e.g. a GCS or an S3 bucket), extracting specific information in structured output, and then writing that to BigQuery or Postgres or some other data store, where there will be further processing downstream (maybe a separate Airflow pipeline)
  • In the job you call Batch API and send the job, and then poll the status field. If completed you close down the Airflow pipeline, if some error occurs, log the reason for the error and shut down the pipeline with a failed state.