AI Systems Architecture — Mastery3 / 9
Orchestration Patterns — Pipelines, Routers, Swarms
Once you have multiple steps or agents, how they're wired together decides cost, latency and reliability. Four patterns cover almost everything.

When work spans multiple steps or agents, the wiring — not the model — drives cost, latency, and reliability. Four patterns cover almost everything you'll build.
The four patterns
- Pipeline — fixed sequence: step A's output feeds B feeds C. Predictable, easy to debug. Use when the path is known (extract → transform → summarize).
- Router — a classifier picks the path: a cheap model triages the request to the right specialist or tool. Use when inputs vary widely (support intents, query types).
- Parallel fan-out / fan-in — split independent work across workers, then merge. Use for N-files, N-sources, multi-perspective review. Wall-clock = slowest worker, not the sum.
- Evaluator-optimizer loop — a generator produces, a critic scores, repeat until good enough. Use for quality-critical output where one shot isn't reliable.
Choosing
Default to the simplest pattern that fits: pipeline if the path is fixed, router if it branches, parallel only for genuinely independent work, loops only when one pass isn't enough. Composing them (a router into pipelines, a fan-out with per-item loops) handles the rest.
Patterns move data between steps. Next: what the system remembers between them — context and memory architecture.
Series — AI Systems Architecture — Mastery
- Part 01Architecting AI Products — First PrinciplesAI systems fail differently from normal software: they're non-deterministic, costly per call, and hard to test. The architecture has to account for all three.
- Part 02Single Agent vs. Multi-Agent — Choosing a TopologyMulti-agent is fashionable and usually premature. Here is how to decide honestly — and why most products should start with one well-equipped agent.
- Part 03Orchestration Patterns — Pipelines, Routers, Swarms — you are hereOnce you have multiple steps or agents, how they're wired together decides cost, latency and reliability. Four patterns cover almost everything.
- Part 04Context & Memory ArchitectureThe context window is your most expensive, most contested resource. What you put in it — and what you remember between calls — is an architectural decision.
- Part 05Evaluation Pipelines as InfrastructureIn AI systems, evaluation is not QA you do at the end — it's infrastructure you build first. Without it, every change is a prayer.
- Part 06Cost Engineering — Token Budgets That HoldAn AI feature that delights at 100 users can bankrupt you at 100,000. Cost is an architectural constraint, designed in — not discovered on the invoice.
- Part 07Latency & Throughput at ScaleInference is slow and bursty. Streaming, parallelism, and the async boundary are what keep an AI product feeling fast under real load.
- Part 08Reliability — Retries, Fallbacks, GuardrailsModels return malformed output, providers go down, and outputs drift. A reliable AI system expects all three and keeps working anyway.
- Part 09The Reference Architecture in ProductionTopology, orchestration, memory, eval, cost, latency and reliability — composed into one blueprint for an AI system that survives real users.