Multi-Agent Collaboration in Practice: A Guide to 4 Architecture Patterns
Three in the morning, and my single-agent system crashed. Not the kind where it actually crashes, but the kind where it outputted three whole pages of nonsense—when all I asked it to do was “check the code style.” I stared at the screen, thinking: How is this thing even more long-winded than that chatty colleague of mine?
This isn’t an isolated case. Over the past year, the number of agent-related papers surged from 820 to over 2,500, with everyone asking the same question: Is a single agent really enough?
Anthropic’s research hit me like a splash of cold water: multi-agent systems outperform single-agent ones by 90.2%. That’s nearly double the performance. I suddenly realized that my “one mega-agent does it all” mentality was just as absurd as “one person doing an entire team’s work.”
In this article, I want to talk about multi-agent collaboration systems—from the decision logic behind four core architecture patterns, to the pitfalls I’ve encountered in production, to runnable code implementations. If you’re also torn between “should I use Subagents or Skills” or “how do I manage state between agents,” this might save you a lot of headaches.
Why Multi-Agent Systems
Let’s be honest, the biggest problem with single agents isn’t “can they work,” but “how long can they keep working.”
Have you ever encountered this situation: an agent that used to write perfectly good code suddenly starts outputting all kinds of messy stuff? Or when you clearly only asked about A, it insists on dragging in B, then C, and finally even brings out Z? This isn’t because the agent is “dumb”—it’s because its context window got blown out.
Single agents have three fatal flaws.
Context limits. No matter how powerful an agent is, 200k tokens is still just 200k tokens. You ask it to simultaneously handle code review, security analysis, and performance optimization, and it’s doing well if it remembers half of it. I’ve seen an agent that was discussing Python in the first 20 turns of a conversation, then on turn 21 started outputting JavaScript—it completely forgot what it was supposed to be doing.
Scattered capabilities. The more skills you pack into a single agent, the more it becomes a “jack of all trades, master of none”—knowing a little about everything, but excelling at nothing. Ask it to do a code review, it might write you a unit test; ask it to write documentation, it might refactor your code as a bonus. Direction? Nonexistent.
Debugging difficulty. When a single agent fails, you have no idea which part went wrong. Prompt too long? Tool call failed? Context pollution? Troubleshooting is like finding a needle in a haystack.
Multi-agent systems are essentially the “microservices architecture” of the AI world. Each agent does one thing, and does it well. They collaborate through clear message passing, rather than cramming everything into one “super agent.”
Google, LangChain, and Anthropic are all pushing this approach. An O’Reilly report shows that agent-related papers in 2025 have grown from 820 at the beginning of the year to over 2,500—a threefold increase. Why? Because everyone has realized: the era of one agent ruling them all is over.
Four Core Architecture Patterns
LangChain and Google have summarized several mainstream multi-agent architectures. After running a few projects, I found that each has its own applicable scenarios. Choose wrong, and you’re either “using a cannon to swat a fly” or “drinking soup with chopsticks.”
Subagents - Central Orchestration Pattern
This is the most intuitive pattern: one “main agent” acts as the commander, leading a group of “sub-agents.” Sub-agents are essentially tools of the main agent, which decides when to call whom.
User request → Main Agent (coordinator) → Dispatch to Sub-agent A/B/C → Aggregate results → Return to user
When to use? Your task involves multiple independent domains. For example, a customer service system: one sub-agent handles order queries, one handles refunds, one handles complaints. Each domain has its own knowledge base and tools, while the main agent just handles “routing.”
Code example (LangGraph):
from langgraph.prebuilt import create_react_agent
# Define sub-agents
order_agent = create_react_agent(
model="claude-3-5-sonnet-20241022",
tools=[query_order, update_order],
prompt="You are an order specialist, only handle order-related questions."
)
refund_agent = create_react_agent(
model="claude-3-5-sonnet-20241022",
tools=[check_refund_policy, process_refund],
prompt="You are a refund specialist, only handle refund-related questions."
)
# Main agent holds sub-agents as tools
main_agent = create_react_agent(
model="claude-3-5-sonnet-20241022",
tools=[order_agent, refund_agent], # Sub-agents are tools
prompt="You are the customer service manager, dispatch to the appropriate specialist based on user questions."
)
Advantages: Clean context isolation, each sub-agent only sees what it needs to see. High parallel execution efficiency.
Disadvantages: Each sub-agent is a separate LLM call, high token consumption. If sub-agents need to share state, it requires extra coordination.
Skills - On-Demand Loading Pattern
One agent, multiple “personas.” Skills are essentially dynamically loaded prompt templates. The agent switches “identities” based on the task, but remains a single agent.
User request → Single Agent → Load "code review" Skill → Execute → Load "doc writer" Skill → Execute
When to use? Your task requires “single-threaded” processing, but different stages need different expertise. For example, a coding assistant: use “developer” mode when writing code, use “technical writer” mode when writing docs.
Code example:
# Skills directory structure
skills/
├── code_review.md # Code review prompt
├── doc_writer.md # Documentation prompt
└── security_audit.md # Security audit prompt
# Dynamic Skill loading
def load_skill(skill_name: str) -> str:
with open(f"skills/{skill_name}.md") as f:
return f.read()
# Usage example
agent = create_react_agent(
model="claude-3-5-sonnet-20241022",
tools=[...],
prompt=load_skill("code_review") # Switch at runtime
)
Advantages: Lightweight, no additional agent coordination overhead. Lower token consumption than Subagents.
Disadvantages: Context accumulates. If you switch Skills 10 times, the content from the previous 9 Skills is still in the context, piling up and getting messier.
Handoffs - State-Driven Pattern
Agents pass tasks between each other like a baton in a relay race. Agent A finishes its work, “throws” the state to Agent B, and Agent B continues. It’s like a relay race.
User request → Agent A (collect info) → Handoff → Agent B (analyze problem) → Handoff → Agent C (provide solution)
When to use? Multi-stage conversation scenarios. For example, a technical support process: collect problem → diagnose problem → provide solution → confirm resolution. Each stage may require different expertise.
Code example:
from langchain_core.tools import tool
# Define handoff tools
@tool
def handoff_to_diagnosis(issue_summary: str) -> str:
"""Hand off to diagnosis specialist."""
return f"Received issue: {issue_summary}, starting diagnosis..."
@tool
def handoff_to_solution(diagnosis_result: str) -> str:
"""Hand off diagnosis result to solution specialist."""
return f"Based on diagnosis: {diagnosis_result}, formulating solution..."
# Agent chain
triage_agent = create_react_agent(
tools=[handoff_to_diagnosis],
prompt="You are a triage specialist, collect user issues and hand off to diagnosis expert."
)
diagnosis_agent = create_react_agent(
tools=[handoff_to_solution],
prompt="You are a diagnosis expert, analyze root cause and hand off to solution expert."
)
Advantages: Natural conversation flow, aligns with human collaboration intuition. Each agent only focuses on the current stage.
Disadvantages: Complex state management. You need to ensure Agent A passes the right data format to Agent B, otherwise the chain breaks.
Router - Parallel Dispatch Pattern
A “router agent” analyzes requests, then dispatches to multiple specialized agents in parallel, and finally synthesizes the results.
User request → Router (classify) → Parallel call to Agent A/B/C → Synthesize results → Return to user
When to use? A request needs to query multiple data sources. For example, an enterprise knowledge base Q&A: Router determines question type, then queries internal docs, external APIs, and databases in parallel, finally synthesizing the answer.
Code example:
from langgraph.graph import StateGraph
# Define parallel execution nodes
async def query_internal_docs(state):
# Query internal docs
return {"internal_results": [...]}
async def query_external_api(state):
# Query external API
return {"external_results": [...]}
async def query_database(state):
# Query database
return {"db_results": [...]}
async def synthesize(state):
# Synthesize all results
all_results = state["internal_results"] + state["external_results"] + state["db_results"]
return {"final_answer": summarize(all_results)}
# Build parallel graph
graph = StateGraph(State)
graph.add_node("internal", query_internal_docs)
graph.add_node("external", query_external_api)
graph.add_node("database", query_database)
graph.add_node("synthesize", synthesize)
# Parallel execution
graph.add_edge("router", ["internal", "external", "database"])
graph.add_edge(["internal", "external", "database"], "synthesize")
Advantages: Parallel execution, fastest speed. Stateless, each query is independent.
Disadvantages: Not suitable for multi-turn conversations. Each request is fresh; agents don’t remember what was discussed in the previous turn.
Architecture Selection Decision Framework
After all this, which one should you choose? I drew a simple decision flow:
What's your need?
│
├─→ Multiple independent domains need parallel processing?
│ │
│ └─→ Subagents (central orchestration)
│
├─→ Single agent with multi-stage skill switching?
│ │
│ └─→ Skills (on-demand loading)
│
├─→ Sequential workflow, one baton to the next?
│ │
│ └─→ Handoffs (state-driven)
│
└─→ Multi-source queries need synthesis?
│
└─→ Router (parallel dispatch)
The flow chart alone might not be intuitive enough, so I compiled a comparison table:
| Pattern | Distributed Dev | Parallelization | Multi-hop Dialogue | Direct User Interaction | Token Consumption |
|---|---|---|---|---|---|
| Subagents | High | High | High | Low | High |
| Skills | High | Medium | High | High | Low |
| Handoffs | None | None | High | High | Medium |
| Router | Medium | High | None | Medium | High |
How to read this table?
- Distributed Development: Is your team developing different modules separately? If so, both Subagents and Skills work well—each member can own a sub-agent or skill.
- Parallelization: Do you care about speed? Router and Subagents can run multiple agents in parallel for maximum efficiency.
- Multi-hop Dialogue: Does the user need multi-turn interaction? Handoffs and Skills naturally support conversation flows.
- Direct User Interaction: Do users talk directly to sub-agents? Skills and Handoffs support this; Router doesn’t.
- Token Consumption: If cost-sensitive, Skills is the most economical; Router and Subagents are the most expensive.
My experience: Start simple. First use Skills or Handoffs to validate an MVP, then upgrade to Subagents or Router when you hit bottlenecks. Don’t jump straight into distributed architecture—the pain of over-engineering is real.
Production Implementation Essentials
From demo to production, there’s a massive gap. I’ve stepped in every one of these pitfalls.
State Management
The biggest problem with shared state across multiple agents is “race conditions”—two agents writing to the same variable simultaneously, and who knows who overwrites whom?
LangGraph’s solution is output_key: each agent can only write to its own exclusive key.
from langgraph.graph import StateGraph, MessagesState
class GraphState(MessagesState):
security_result: str = "" # Security agent exclusive
style_result: str = "" # Style agent exclusive
perf_result: str = "" # Performance agent exclusive
# Security agent only writes security_result
async def security_agent(state: GraphState):
result = await analyze_security(state["messages"])
return {"security_result": result} # Only write this one key
# Style agent only writes style_result
async def style_agent(state: GraphState):
result = await analyze_style(state["messages"])
return {"style_result": result}
This way, whether parallel or sequential, each agent only touches its own slice of the pie—no interference.
Another common issue is “context pollution.” Agent A’s output gets read by Agent B, but B doesn’t need that information at all. My solution: add a relevant_keys field in the state, so each agent only reads the keys it needs.
Performance Optimization
Token consumption in multi-agent systems is a bottomless pit. Here are some token-saving tips:
1. Subagents save 67% tokens vs Skills (multi-domain scenarios)
LangChain’s test data: if a task involves 3 independent domains, Subagents’ token consumption is one-third of Skills. Why? Because Subagents have context isolation—each sub-agent only sees content from its own domain. With Skills, all skills’ contexts pile up together, getting bigger and bigger.
2. Stateful mode saves 40-50% redundant calls
If your task has a lot of repeated queries (like asking the same question 10 times), use stateful Handoffs mode, and the agent can remember previous answers. LangChain’s data: stateful saves nearly half the LLM calls compared to stateless.
3. Limit reflection mode iterations
Many people like giving agents “reflection” ability—letting them check their own output, find issues, and regenerate. This is great, but it’s easy to get into infinite loops. I usually limit max_iterations=2 or 3, forcing exit after that.
from langgraph.checkpoint.memory import MemorySaver
# Set iteration limit
graph = create_react_agent(
model="claude-3-5-sonnet-20241022",
tools=[...],
checkpointer=MemorySaver(),
config={"configurable": {"max_iterations": 3}} # Max 3 reflections
)
Common Pitfalls
Infinite loops: Agent calls itself, which calls itself, which calls itself… endlessly. Solution: set max_iterations and clear exit conditions.
def should_continue(state):
if state["iteration_count"] >= 3:
return "end"
if "done" in state["messages"][-1].content:
return "end"
return "continue"
Context bloat: Agent gets increasingly “dumb,” outputs get shorter and shorter. Usually means context is stuffed with too much stuff. Solution: use Blackboard pattern (shared blackboard), keep only necessary context, clean up periodically.
Coordination tax: As agent count increases, communication overhead grows exponentially. I tested a system: going from 3 agents to 10, response time went from 2 seconds to 15 seconds. Solution: merge agents with similar responsibilities, keep agent count under 5.
Complete Implementation Example
Enough theory, let’s get practical. I built a code review multi-agent system using the Router + ParallelAgent pattern.
Architecture: Router determines code language and type → parallel calls to security audit, style check, performance analysis agents → synthesizes results into a report.
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
# Define state
class CodeReviewState(TypedDict):
code: str
language: str
security_issues: list
style_issues: list
perf_issues: list
final_report: str
# Initialize LLM
llm = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Router: determine language
async def route_code(state: CodeReviewState) -> dict:
code = state["code"]
# Simple detection, can use LLM for classification in production
if "def " in code or "import " in code:
language = "python"
elif "function" in code or "const " in code:
language = "javascript"
else:
language = "unknown"
return {"language": language}
# Security audit agent
async def security_audit(state: CodeReviewState) -> dict:
code = state["code"]
prompt = f"""You are a security audit expert. Check the following code for security issues:
- SQL injection risks
- XSS vulnerabilities
- Sensitive data leakage
- Insecure dependencies
Code:
{code}
Output issues as a JSON list, each containing: line, severity, description.
"""
response = await llm.ainvoke(prompt)
# Parse results...
return {"security_issues": []}
# Style check agent
async def style_check(state: CodeReviewState) -> dict:
code = state["code"]
language = state["language"]
prompt = f"""You are a code style expert. Check the following {language} code for style issues:
- Naming conventions
- Code formatting
- Comment completeness
Code:
{code}
Output issues as a JSON list.
"""
response = await llm.ainvoke(prompt)
return {"style_issues": []}
# Performance analysis agent
async def perf_analysis(state: CodeReviewState) -> dict:
code = state["code"]
prompt = f"""You are a performance analysis expert. Check the following code for performance issues:
- High time complexity
- Unnecessary loops
- Memory leak risks
Code:
{code}
Output issues as a JSON list.
"""
response = await llm.ainvoke(prompt)
return {"perf_issues": []}
# Generate report
async def generate_report(state: CodeReviewState) -> dict:
security = state.get("security_issues", [])
style = state.get("style_issues", [])
perf = state.get("perf_issues", [])
total_issues = len(security) + len(style) + len(perf)
report = f"""# Code Review Report
## Overview
- Language: {state['language']}
- Total Issues: {total_issues}
## Security Issues ({len(security)})
{format_issues(security)}
## Style Issues ({len(style))})
{format_issues(style)}
## Performance Issues ({len(perf)})
{format_issues(perf)}
## Recommendations
Based on the above analysis, prioritize fixing security issues...
"""
return {"final_report": report}
# Build graph
graph = StateGraph(CodeReviewState)
graph.add_node("router", route_code)
graph.add_node("security", security_audit)
graph.add_node("style", style_check)
graph.add_node("perf", perf_analysis)
graph.add_node("report", generate_report)
# Flow: Router → parallel execution of three checks → generate report
graph.set_entry_point("router")
graph.add_edge("router", "security")
graph.add_edge("router", "style")
graph.add_edge("router", "perf")
graph.add_edge("security", "report")
graph.add_edge("style", "report")
graph.add_edge("perf", "report")
graph.add_edge("report", END)
# Compile
app = graph.compile()
# Usage
async def review_code(code: str):
result = await app.ainvoke({"code": code})
return result["final_report"]
Once running, for a 100-line code snippet, three agents execute in parallel and produce results in about 3-5 seconds. If run sequentially, it would take at least 10 seconds.
Of course, this is just a basic version. In production, you’d also need to add: caching (don’t re-review the same code), incremental review (only look at changed parts), human feedback (let users mark false positives). But all these extensions build on top of this architecture.
Build a Multi-Agent Collaboration System
From scratch, build a code review multi-agent system
- 1
Step1: Choose architecture pattern
Select appropriate architecture pattern based on task characteristics - 2
Step2: Define state structure
Use TypedDict to define shared state for multiple agents - 3
Step3: Create agent nodes
Create independent node functions for each agent - 4
Step4: Build execution graph
Use LangGraph StateGraph to build execution flow - 5
Step5: Add state management
Use output_key to avoid race conditions
Conclusion
After all this, it really comes down to three points:
Underlying logic: Pattern selection is more important than framework selection. LangGraph, AutoGen, CrewAI are all great tools, but if you use Router to solve a problem that needs Handoffs, no framework can save you.
Middle-level strategy: Start simple, upgrade gradually. First validate an MVP with Skills or Handoffs, then consider Subagents or Router when you hit bottlenecks. Over-engineering is the biggest pitfall—I’ve been there, don’t go there.
Top-level implementation: In production, focus on state management, performance, and cost. Nail token consumption, infinite loops, and context pollution, and your multi-agent system will run stably.
Next action: Open the LangGraph docs, pick a pattern, and implement the simplest multi-agent system in 50 lines of code. Don’t overthink it—just get it running first.
12 min read · Published on: Mar 25, 2026 · Modified on: Mar 25, 2026
Related Posts
AI Workflow Automation in Practice: n8n + Agent from Beginner to Master
AI Workflow Automation in Practice: n8n + Agent from Beginner to Master
Multimodal AI Application Development Guide: From Model Selection to Production Deployment
Multimodal AI Application Development Guide: From Model Selection to Production Deployment
Self-Evolving AI: Key Technical Paths for Continuous Model Learning

Comments
Sign in with GitHub to leave a comment