AI Agent Pipeline Architecture: 7 Design Patterns with Mermaid Diagrams (2026)
AI Agent Pipeline Architecture: 7 Design Patterns with Mermaid Diagrams (2026)
Most AI agent pipelines break in production because the architecture is wrong from the start. The agent that works in a demo with 3 steps fails when you need 8 agents handling conditional logic, parallel execution, and error recovery.
This guide covers 7 production-grade AI agent pipeline architecture patterns. Each pattern includes a Mermaid diagram, when to use it, when not to, and cost estimates based on real usage data from teams running multi-agent pipelines daily.
Quick reference:
Scroll to see full table
| Pattern | Agents | Latency | Cost | Best For |
|---|---|---|---|---|
| Sequential | 2-5 | High (sum) | Low | Linear processes |
| Parallel | 2-10 | Low (max) | Medium | Independent subtasks |
| Conditional | 2-6 | Variable | Low-Med | Branching logic |
| Fan-out/Fan-in | 3-15 | Low-Med | Medium | Scaling to many variants |
| DAG | 3-20 | Optimized | Variable | Complex dependencies |
| Iterative (Loop) | 2-4 | Variable | Variable | Quality improvement |
| Event-driven | 3-10+ | Async | Variable | Real-time systems |
In this guide:
- Sequential pipeline
- Parallel pipeline
- Conditional pipeline
- Fan-out/fan-in pipeline
- DAG pipeline
- Iterative pipeline
- Event-driven pipeline
- Decision framework
- Implementation patterns
- FAQ
Related guides: What Is an AI Agent Pipeline · Sequential Agent Workflows · AI Agent Orchestration Guide · Multi-Agent Collaboration Patterns
1. Sequential Pipeline
The simplest pattern. Agents execute one after another. Each agent receives the accumulated context from all previous agents.
graph LR
A[Input] --> B[Agent 1: Research]
B --> C[Agent 2: Draft]
C --> D[Agent 3: Review]
D --> E[Output]
How It Works
Agent 1 processes the input and produces Output 1. Agent 2 receives the original input + Output 1, and produces Output 2. Agent 3 receives everything and produces the final output.
Context grows at each step:
Step 1: Agent 1 sees [input]
Step 2: Agent 2 sees [input, output_1]
Step 3: Agent 3 sees [input, output_1, output_2]
This context accumulation is what makes sequential pipelines powerful. Each downstream agent benefits from all upstream work.
When to Use
- Content creation pipelines (Research → Write → Review → Publish)
- Code deployment pipelines (Lint → Test → Review → Deploy)
- Data processing (Extract → Transform → Validate → Load)
When Not to Use
- Steps are independent and could run simultaneously (use Parallel)
- Different steps are needed based on the input (use Conditional)
- Total latency exceeds your requirement (use DAG or Parallel)
Real Example: Content Production Pipeline
Scroll to see full table
| Step | Agent | Model | Cost | Time |
|---|---|---|---|---|
| 1 | Research | Gemini 2.5 Pro | Free | 30s |
| 2 | Write | Claude Sonnet | $0.08 | 90s |
| 3 | Review | Claude Haiku | $0.02 | 20s |
Total cost: $0.10 per article. Total time: ~2.5 minutes.
Cost Formula
Total Cost = sum(agent_costs)
Total Latency = sum(agent_latencies)
2. Parallel Pipeline
Multiple agents execute simultaneously on the same input. Results are merged at the end.
graph TD
A[Input] --> B[Agent 1: Blog Post]
A --> C[Agent 2: Social Posts]
A --> D[Agent 3: Email Draft]
B --> E[Merge]
C --> E
D --> E
E --> F[Output]
How It Works
The input is broadcast to all agents at once. Each agent processes independently. A merge step combines the results.
Key insight: The merge step matters more than you think. Simply concatenating outputs produces poor results. The merge agent needs to:
- Deduplicate overlapping information
- Ensure consistent tone across all outputs
- Apply formatting rules to each piece
When to Use
- Multi-format content generation (blog + social + email from one input)
- A/B testing (generate multiple variants simultaneously)
- Multi-perspective analysis (analyze from different angles)
- Batch processing (process multiple items at once)
When Not to Use
- Steps depend on each other's output (use Sequential)
- You need to minimize total API cost (parallel runs more tokens)
- The merge step is complex enough to become a bottleneck
Real Example: Multi-Format Content
Scroll to see full table
| Agent | Output | Model | Cost | Time |
|---|---|---|---|---|
| Blog Writer | 1500-word article | Claude Sonnet | $0.08 | 90s |
| Social Writer | 5 platform posts | Claude Haiku | $0.01 | 15s |
| Email Writer | Newsletter draft | Claude Haiku | $0.01 | 15s |
| Merger | Format + consistency check | Claude Haiku | $0.01 | 10s |
Total cost: $0.11. Total time: ~90s (parallel execution).
Cost Formula
Total Cost = sum(agent_costs) + merge_cost
Total Latency = max(agent_latencies) + merge_latency
3. Conditional Pipeline
The pipeline branches based on the output of a routing agent. Different inputs follow different paths.
graph TD
A[Input] --> B[Router Agent]
B -->|Code task| C[Code Pipeline]
B -->|Content task| D[Content Pipeline]
B -->|Research task| E[Research Pipeline]
C --> F[Output]
D --> F
E --> F
How It Works
A classifier or router agent evaluates the input and selects the appropriate branch. Each branch is a separate sub-pipeline optimized for that task type.
The router can use a cheap, fast model (Haiku or Gemini Flash) since classification requires less capability than generation. This keeps routing costs under $0.002 per task.
When to Use
- Multi-purpose agent systems that handle different task types
- Workflows where the next step depends on content analysis
- Systems that need different processing based on input language, domain, or urgency
- Customer-facing AI that routes to specialized backends
When Not to Use
- All tasks follow the same path (use Sequential)
- The routing decision is trivial (hardcode it instead)
- You have fewer than 3 distinct branches (overkill)
Real Example: Task Router
Scroll to see full table
| Input Type | Route | Pipeline | Cost |
|---|---|---|---|
| "Write a blog post about..." | Content | Research → Write → Review | $0.10 |
| "Debug this Python function" | Code | Analyze → Fix → Test | $0.08 |
| "Research competitor pricing" | Research | Search → Extract → Synthesize | $0.05 |
| "Summarize this document" | Quick | Single agent summary | $0.01 |
Cost Formula
Total Cost = router_cost + branch_cost(selected_branch)
Total Latency = router_latency + branch_latency(selected_branch)
4. Fan-out/Fan-in Pipeline
One input fans out to many specialized agents, then fans back in to combine results. Similar to parallel, but the fan-out agents are specialists rather than generalists doing the same task.
graph TD
A[Input: Product Brief] --> B[Market Analyst]
A --> C[Competitor Analyst]
A --> D[Financial Analyst]
A --> E[Technical Analyst]
B --> F[Synthesis Agent]
C --> F
D --> F
E --> F
F --> G[Comprehensive Report]
How It Works
The input is broadcast to N specialist agents, each with a different analytical lens. The synthesis agent receives all specialist outputs and produces an integrated analysis.
Get AI agent tips in your inbox
Multi-agent workflows, BYOK tips, and product updates. No spam.
Why this beats a single generalist agent: A single agent analyzing a business from "all angles" produces shallow analysis on each dimension. Four specialist agents produce deep analysis on their dimension. The synthesis agent weaves them together.
When to Use
- Multi-dimensional analysis (business, technical, legal, financial)
- A/B/n content generation (produce N variants, pick the best)
- Audience segmentation (generate content for different personas)
- Multi-source research (different agents search different databases)
When Not to Use
- Only 2 parallel paths (use simple Parallel)
- The synthesis step cannot meaningfully combine the outputs
- Token budget is tight (fan-out is expensive)
Real Example: Business Analysis Pipeline
Scroll to see full table
| Specialist | Focus | Model | Tokens | Cost |
|---|---|---|---|---|
| Market Analyst | Market size, trends, TAM | Sonnet | 2K in, 1K out | $0.03 |
| Competitor Analyst | Competitors, positioning | Sonnet | 2K in, 1K out | $0.03 |
| Financial Analyst | Unit economics, projections | Sonnet | 2K in, 800 out | $0.025 |
| Technical Analyst | Architecture, feasibility | Sonnet | 2K in, 800 out | $0.025 |
| Synthesis Agent | Combine all analyses | Sonnet | 6K in, 2K out | $0.08 |
Total cost: $0.19 per comprehensive analysis. Total time: ~60s (parallel specialists + 30s synthesis).
Cost Formula
Total Cost = sum(specialist_costs) + synthesis_cost
Total Latency = max(specialist_latencies) + synthesis_latency
5. DAG (Directed Acyclic Graph) Pipeline
Agents form a dependency graph. An agent runs as soon as all its dependencies complete. This is the most general pattern.
graph TD
A[Research] --> B[Outline]
B --> C[Write Draft]
B --> D[Create Diagrams]
C --> E[Code Examples]
D --> F[Assemble]
C --> F
E --> F
F --> G[Review]
G --> H[Final Output]
How It Works
Each agent has explicit dependencies. The scheduler runs agents in topological order, executing independent agents in parallel. This combines the benefits of sequential (correct ordering) and parallel (speed).
Why DAG beats simple sequential for complex pipelines: In the example above, "Create Diagrams" and "Write Draft" can run in parallel after the outline is ready. In a sequential pipeline, you'd run them one after another, adding 30-60 seconds of unnecessary latency.
When to Use
- Complex workflows with both sequential and parallel dependencies
- Production systems where latency optimization matters
- Pipelines with more than 5 agents
- CI/CD-style AI workflows
When Not to Use
- Simple 2-3 step pipelines (overhead of DAG management is not worth it)
- All steps are sequential (use Sequential)
- All steps are independent (use Parallel)
Real Example: Technical Blog Post Pipeline
Scroll to see full table
| Agent | Depends On | Output | Time |
|---|---|---|---|
| Research | (input) | Research findings | 30s |
| Outline | Research | Structured outline | 20s |
| Write Draft | Outline | 1500-word draft | 60s |
| Create Diagrams | Outline | 3 Mermaid diagrams | 15s |
| Code Examples | Write Draft | 2 code snippets | 20s |
| Assemble | Write Draft, Diagrams, Code | Full article | 10s |
| Review | Assemble | Quality score + edits | 15s |
Sequential latency: 170s. DAG latency: ~105s (38% faster).
Cost Formula
Total Cost = sum(all_agent_costs)
Total Latency = critical_path_latency
6. Iterative (Loop) Pipeline
Agents run in a loop until a quality threshold is met. The reviewer decides if another iteration is needed.
graph TD
A[Input] --> B[Writer Agent]
B --> C[Reviewer Agent]
C -->|Score < 8| B
C -->|Score >= 8| D[Output]
How It Works
A writer produces output. A reviewer evaluates it on a defined rubric. If the score is below the threshold, the reviewer's feedback is appended to the context and the writer tries again. This continues until quality meets the bar or the max iteration count is reached.
Critical: always set a max iteration limit. Without it, a pathological case can loop indefinitely. Typical limits: 3-5 iterations.
When to Use
- Quality-critical outputs (client deliverables, published content, production code)
- Tasks where first-pass quality is unpredictable
- Systems with measurable quality metrics (test coverage, readability scores)
When Not to Use
- One-shot quality is sufficient (most simple tasks)
- Cost sensitivity is high (each iteration doubles the cost)
- Latency requirements are strict (loops add unpredictable time)
Real Example: Code Quality Pipeline
Scroll to see full table
| Iteration | Review Score | Cost | Cumulative |
|---|---|---|---|
| 1 | 5/10 | $0.10 | $0.10 |
| 2 | 7/10 | $0.10 | $0.20 |
| 3 | 9/10 (pass) | $0.10 | $0.30 |
Average iterations to pass: 2.3. Average cost: $0.23 per output.
Cost Formula
Total Cost = iteration_cost * actual_iterations
Max Cost = iteration_cost * max_iterations
Total Latency = iteration_latency * actual_iterations
7. Event-driven Pipeline
Agents react to events rather than following a fixed flow. A orchestrator agent decides which agents to invoke based on the current state.
graph TD
A[Event: New Document] --> B[Orchestrator]
B -->|Needs Research| C[Research Agent]
B -->|Needs Translation| D[Translation Agent]
B -->|Ready for Review| E[Review Agent]
C --> B
D --> B
E -->|Approved| F[Output]
E -->|Changes Needed| B
How It Works
An orchestrator agent maintains state and decides which agent to invoke next based on the current context. This is the most flexible pattern but also the most complex to debug.
Event-driven vs. conditional: In a conditional pipeline, the routing logic is fixed at design time. In an event-driven pipeline, the orchestrator makes routing decisions at runtime based on accumulated state. This means the same input can follow different paths on different runs.
When to Use
- Long-running workflows with human-in-the-loop steps
- Systems that need to adapt to unexpected intermediate results
- Multi-stage approval processes | Customer support automation with escalation logic
When Not to Use
- Fixed, predictable workflows (overkill, use simpler patterns)
- You need deterministic outputs for the same input
- Debugging and observability are critical (event-driven is hardest to debug)
Real Example: Document Processing Pipeline
Scroll to see full table
| Event | Orchestrator Decision | Agent Invoked |
|---|---|---|
| Document uploaded | Classify document type | Classifier |
| Type: Contract | Extract key terms | Legal Extractor |
| Type: Invoice | Extract amounts + dates | Finance Extractor |
| Extraction done | Validate completeness | Validator |
| Validation failed | Re-extract missing fields | Extractor |
| Validation passed | Generate summary | Summarizer |
Cost Formula
Total Cost = orchestrator_cost * decisions + sum(invoked_agent_costs)
Total Latency = variable (depends on decisions made)
Decision Framework: Choosing the Right Pattern
Use this decision tree to select the right architecture:
graph TD
A[Start: What type of workflow?] --> B{Steps depend on each other?}
B -->|No| C{Same task, many items?}
B -->|Yes| D{Linear dependencies?}
C -->|Yes| E[Parallel]
C -->|No| F{Need specialist perspectives?}
F -->|Yes| G[Fan-out/Fan-in]
F -->|No| E
D -->|Yes| H{Need quality loop?}
D -->|No| I{Complex dependencies?}
H -->|Yes| J[Iterative]
H -->|No| K{Different paths based on input?}
K -->|Yes| L[Conditional]
K -->|No| M[Sequential]
I -->|Yes| N[DAG]
I -->|No| L
Decision Table
Scroll to see full table
| Your Situation | Pattern | Why |
|---|---|---|
| Simple linear process (A→B→C) | Sequential | Simplest to build and debug |
| Same input, multiple independent outputs | Parallel | Fastest total time |
| Different workflows for different inputs | Conditional | Flexible routing |
| Multiple specialists analyzing same input | Fan-out/Fan-in | Deeper analysis per dimension |
| Complex graph of dependencies | DAG | Optimal scheduling |
| Quality must meet a threshold | Iterative | Convergence guarantee |
| Adaptive, state-dependent routing | Event-driven | Maximum flexibility |
Implementation Patterns
Error Handling
Every pipeline pattern needs error handling. The minimum viable approach:
- Timeout per agent: Set a maximum execution time (typically 120s). Kill the agent if it exceeds this.
- Retry with backoff: For transient failures (API rate limits, network errors), retry up to 3 times with exponential backoff.
- Fallback models: If the primary model fails, fall back to a cheaper model. Better to get lower-quality output than no output.
- Partial completion: If one parallel branch fails, return the successful branches with a warning rather than failing the entire pipeline.
Context Management
As pipelines get longer, context windows fill up. Three strategies:
- Summarization: Between stages, summarize the previous output to reduce token count. Cost: $0.005 per summary. Benefit: 50-70% token reduction.
- Selective context: Only pass the outputs that the current agent needs, not the full history.
- Sliding window: Keep only the last N outputs in context. Simple but risks losing important early context.
Observability
For any pipeline in production, log:
- Input and output of each agent (for debugging)
- Token usage per agent (for cost tracking)
- Latency per agent (for performance optimization)
- Final output quality score (for regression detection)
FAQ
What is an AI agent pipeline architecture?
An AI agent pipeline architecture is the structural design of how multiple AI agents are connected and coordinated to complete a multi-step task. It defines the flow of data between agents, the execution order, and how errors are handled. Common patterns include sequential, parallel, conditional, fan-out/fan-in, DAG, iterative, and event-driven.
Which AI agent pipeline pattern should I use?
Use sequential for simple linear workflows, parallel when steps are independent, conditional when the path depends on the input, fan-out/fan-in for multi-specialist analysis, DAG for complex dependency graphs, iterative when quality thresholds must be met, and event-driven for adaptive runtime routing.
How much does an AI agent pipeline cost to run?
Most multi-agent pipelines cost $0.05 to $0.30 per run using BYOK (Bring Your Own Key) pricing. Sequential pipelines with 3 agents cost ~$0.10. Parallel pipelines cost slightly more due to concurrent API calls. Iterative pipelines average $0.20-$0.30 because they loop until quality passes.
Can I combine multiple pipeline patterns?
Yes. Production pipelines often combine patterns. A common combination is a conditional router that sends tasks to different DAG sub-pipelines. Another is a fan-out that fans into an iterative quality loop. The key is to keep the architecture diagram documented so the team can reason about it.
How do DAG pipelines differ from sequential pipelines?
A DAG pipeline respects dependencies between agents while running independent agents in parallel. A sequential pipeline runs every agent one after another. For a pipeline with 6 agents where 3 pairs can run in parallel, a DAG is 30-40% faster than sequential with identical output quality.
What is the fan-out/fan-in pattern in AI agent pipelines?
Fan-out/fan-in broadcasts the same input to multiple specialist agents who analyze it from different perspectives, then a synthesis agent combines all specialist outputs into one integrated result. It is used for multi-dimensional analysis, A/B/n content generation, and audience segmentation.
How do I handle errors in multi-agent pipelines?
Implement timeout limits per agent (120s), retry transient failures up to 3 times with exponential backoff, configure fallback models for primary model failures, and design for partial completion where one failed branch does not block the entire pipeline.
Build your first multi-agent pipeline in 5 minutes. Ivern AI lets you create agent squads with sequential, parallel, and conditional pipelines using a simple web interface. Bring your own API keys, pay only for the tokens you use. Get started free →
Want to try multi-agent AI for free?
Generate a blog post, Twitter thread, LinkedIn post, and newsletter from one prompt. No signup required.
Try the Free DemoAI Agent Squads -- Free to Start
One prompt generates blog posts, social media, and emails. Free tier, BYOK, zero markup.
No spam. Unsubscribe anytime.