Best Way to Structure Backend Architecture for OpenAI API Calls?

I’m unsure whether OpenAI API calls should be handled directly in the backend, through workers, or via async jobs.
What architecture patterns are you using to keep things scalable, secure, and responsive?

Depends on you app. My gut says:

  1. Design the precise workflow for each task (and its subtasks).
  2. See which of them can be done in parallel.
  3. See how the above can be done async.
  4. See if async jobs and pub-sub makes sense.
  5. Choose your stack for the steps 1-4.

Personally, the more I work with it the more I see the stability and results (dev time, app experience, etc.) are way better if async jobs are used combined with solid database modeling.

1 Like
  • It depends heavily on workload, latency tolerance, and data boundaries.
2 Likes

for parallel/async parts of the agent flow, do you set up traditional pub/sub mechanism? Looking for solutions that integrate with openai agents sdk for this specific use case

I’m probably not the best person to ask about agents or agent SDKs (almost never use them). I usually avoid using agents as the thing that controls execution in production systems. The reason is not that they are “bad,” but that when you build software for real users and real businesses, one of the main success criteria is predictable behavior.

Agents are convenient in the beginning because they hide complexity, but what you get in return is a system where a model decides what happens next. That can work well for exploration or prototypes, but in business software it often becomes a risk you realize too late.

What has worked better for me is designing the full flow explicitly.

I break the problem into very small, clear steps and decide which of those steps should be regular code and which ones actually benefit from an LLM. Then the application itself decides the order of execution.

In that setup, AI is just one tool inside the process, not the thing steering the whole process. This is also why direct model calls (Responses API, and earlier Completions-style usage) have been more stable for me in production: you clearly see where AI starts, where it ends, and what data goes in and out.

For async work and parallelism, the key idea is that most applications do not need a heavy event system from day one. If everything lives within one product boundary, a database plus background jobs is often enough. You persist state, enqueue work, run steps in parallel when needed, and aggregate results based on saved state. This gives you retries, visibility, and control without adding unnecessary moving parts.

A full pub/sub or event streaming setup starts to make sense when the system grows beyond that boundary (so maybe just foresee the future need from day one without implementing).

For example, when you have multiple independent services that need to react to the same events, when different teams own different consumers, when throughput is very high, or when you need durable event logs and the ability to replay history. At that point, events become a shared contract rather than an internal implementation detail.

In the end, it really depends on what you are building. The acceptable level of unpredictability, the complexity of the flow, the need for parallel execution, and how critical stability is all matter.

If you can share more details about what you are trying to achieve, it becomes much easier to reason about what make sense for you.