10 步构建 Claude Agent 团队：从 1 个到 20 个并行工作

Codez（@0xCodez）整合了多 Agent 编排的官方文档、cookbook、Netflix 和 Spiral 的真实设置，给出从单个 Agent 到 20 个并行工作的完整 walkthrough。

直到 2026 年 4 月，这需要数月的 infra 工程。现在是一个 YAML 配置和 10 个步骤。

为什么单个 Agent 会撞墙

你构建了一个 Agent。它工作了。所以你给了它更多。

添加研究能力。添加报告写作。添加数据分析。添加审查步骤。每个添加使系统提示更长、工具列表更大、上下文窗口更拥挤。

然后有一天你注意到 Agent 变慢、更困惑、对过去擅长的事情做得更差。

这不是模型问题。这是架构问题。

单个 Agent 有一个上下文窗口，你 bolt on 的每个能力都在竞争相同的有限注意力。超过某个复杂度，一个 juggling 十份工作的通才表现得比十个各做一份的专家更差。

修复方案是一个团队。不是更大的 prompt——是劳动分工。

三个事实定义架构，直接来自文档：最多 20 个 unique Agent 在 roster 中，每个在自己的 isolated 上下文窗口中，所有共享一个文件系统。Isolated thinking，shared workspace——这让团队可以并行工作而不混乱。

决定与设计

01. 确认你真的需要一个团队

不要因为有声 impress 就 reach for multi-agent。它 cost 更多 token 并增加协调开销。当以下三件事之一为真时才 reach：

并行化：工作拆分成独立的子任务——单独的日志文件、单独的代码模块。单个 Agent 顺序做这些；团队同时做。
专业化：不同问题需要不同专业知识——安全审查员、文档写手、定价建模师——一个通才在它们之间 context-switching 会 degrade 所有。
升级：大多数工作简单，但有些子任务意外地 hard。团队把 hard 的 route 给更强大的模型，而不是每步都 pay for it。

02. 在写任何代码前映射角色

多 Agent 设计是组织设计。在代码前，在纸上 sketch 团队：一个协调员，和一份每个有清晰工作的专家列表。

锚定在一个真实模式。Anthropic 记录的 incident-response 示例：lead Agent 运行调查，subagents 同时 fan out 到部署历史、错误日志、指标和支持工单——全部同时。

四个专家，一个协调员，单个 Agent 会顺序做的工作并行发生。

命名每个角色，给它一个 one-sentence job，note 它的模型和工具。如果两个角色重叠，合并它们。Fewer、sharper 的专家 beat many fuzzy ones。

03. 为每个角色选择模型——这是节省所在

大多数人错过的 move：团队中的每个 Agent 可以运行不同的模型。你不是 lock 到一个。

Spiral by Every 的生产模式证明——他们用 Haiku 作为协调员，Opus 作为写作 subagents。

协调员只是 route 和 sequence，fast cheap model 做得很好。昂贵的 heavy lifting 只发生在需要它的专家中。

Match model to job。按角色混合 tier 是 cost 和 speed 的 single biggest lever。

构建团队

04. 设置 Managed Agents

每个多 Agent 请求运行在 Claude Managed Agents 上，需要 beta header managed-agents-2026-04-01。SDK 自动设置。

为什么 Managed Agents 而不是自己的设置？因为一旦团队需要远程运行、扩展到 many users、共享文件系统、持久化状态，你面对一个 infra 问题——sessions、memory、security、sandboxing。

Managed Agents 处理所有，所以你只设计团队。安装 Anthropic SDK，从 Console 设置 API key，准备就绪。

05. 将每个专家创建为独立 Agent

先 bottom-up 构建专家。每个是 standalone Agent，有自己的模型、prompt 和 scoped toolset。创建它们并保留 agent IDs——协调员会引用它们。

关键纪律：tightly scope 每个专家的工具。Cookbook 的 sales-proposal 示例中，researcher 得到 web search，librarian 只得到文件读取，pricing modeler 只看到 rules file 和 seat count。

每个 Agent touch 恰好它 job 需要的，nothing more——这保持它 focused 和整个系统 auditable。

06. 创建协调员并声明 roster

现在 lead agent。通过设置 multiagent 字段标记为协调员，列出它可以委派给的 subagent IDs。配置故意简洁：

name: Engineering Lead
model: claude-opus-4-7
system: >
  You coordinate engineering work. Delegate code review
  to the reviewer agent and test writing to the test agent.
tools:
  - type: agent_toolset_20260401
multiagent:
  type: coordinator
  agents:
    - type: agent
      id: $REVIEWER_AGENT_ID
    - type: agent
      id: $TEST_WRITER_AGENT_ID

agent_toolset_20260401 工具给协调员委派能力。Roster 最多 20 条目。

可以 pin 特定 agent 版本、引用 latest，或用 {"type": "self"} 让协调员 spawn 自己的副本用于 recursive 并行化。

07. 把协调员的 prompt 写成经理，不是执行者

这是团队成功或失败的地方。协调员的系统提示不应该 try to do the work——它应该描述 how to delegate the work。

好的协调员 prompt 说：这些是你的专家，这是每个做什么的，这是如何决定谁得到什么，这是如何 combine 它们的输出。它推理 sequencing 和 synthesis，不是 domain details——那些 live in specialists。

如果你把 domain instructions 写进协调员，那个内容 belongs in a subagent instead。

运行、观察、改进

08. 理解团队如何沟通

运行协调员时，mechanics 具体且 worth knowing：

每个协调员委派给的 subagent spawn 自己的 session thread——context-isolated event stream 有自己的历史。协调员在 primary thread 中报告；新线程在运行时出现。

关键：threads 是 persistent 的。协调员可以 send follow-up 给它之前 call 的 agent，那个 agent 保留之前 turns 的所有内容。

一个 hard constraint 需要设计 around：协调员只能 delegate one level deep。Depth > 1 被忽略。Specialists 不能运行自己的 sub-teams。这是 deliberate——保持系统 predictable 和 traceable。

09. 在 Claude Console 中观察整个事情

生产多 Agent 系统和实验性的区别是 observability。

每次运行产生 Claude Console 中的 full trace：哪个 Agent 做了什么、什么顺序、为什么。可以看到每个 delegation decision，inspect 每个 subagent 的 reasoning，follow sequence end to end。

当结果错误时，trace 告诉你哪个 specialist 失败，问题是 delegation 还是 specialist 本身。不要 blind 运行团队——读 trace。

10. 扩展到 20 并添加共享记忆

小团队工作后，scale 它。添加 specialists 到 20-agent roster limit，让协调员 fan out across all of them in parallel。

然后用共享记忆 close the loop。当许多 subagents 在同一领域工作时，Dreaming 功能可以 aggregate 它们 collectively learned 并 publish shared insights 到 team-wide memory store——no single agent session 可以 alone 产生的东西。

团队不仅并行工作；它作为一个 unit 随时间变得更聪明。

这就是 Netflix 的平台团队在生产中运行的：多 Agent 编排处理来自数百个同时构建的日志，并行 subagents 跨数千个应用 surface recurring issues——在单个 Agent 设置中会 hopelessly sequential 的工作。

破坏 Agent 团队的错误

构建团队时一个 Agent 就够了。Multi-agent cost 更多并协调更慢。如果工作不 parallelize、specialize 或 escalate，你 added complexity for nothing。
协调员自己做工作。如果 lead agent 有 domain instructions 而不是 delegation logic，你 build 了一个 bloated single agent wearing team costume。
松散的工具 scoping。当每个 specialist 可以 touch 一切时，focus collapses 且 trace 变得 unreadable。Scope 每个 Agent 到恰好它的 job。
与 depth-1 limit 斗争。Coordinators delegate one level deep。Designing hidden hierarchy of sub-coordinators 浪费时间——depth 被忽略。
Blind 运行。Console trace 存在所以你可以 see 哪个 Agent 做了什么。Skip 它且你无法 debug 有 moving parts 的系统。

结论

大多数人会继续 stuffing more capability into one agent，watching it slow down and degrade，并 concluding agents just are not ready。

那些 build 团队的人会有不同的东西：一个 delegate 的协调员、在自己的 context 中并行工作的 specialists、它们协作的 shared filesystem、以及使整个团队随时间更 sharp 的 memory store。

Pick 一个拆分成 parallel pieces 的任务。Map 三个 specialists 和一个协调员。先 build 那个小团队。那 alone 会改变你的 Agent 能 handle 多少。