Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions apps/marketing/content/blog/agent-orchestration-not-another-agent.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: "You Don't Need Another AI Coding Agent — You Need an Orchestrator"
description: "The AI coding tools landscape is flooded with agents. The real bottleneck isn't agent quality — it's managing multiple agents at scale. Here's why orchestration is the missing layer."
author: satya
date: 2026-02-18
category: Product
relatedSlugs:
- parallel-coding-agents-guide
- roadmap-to-100-agents
- working-with-worktrees-in-superset
---

Every month, a new AI coding agent launches. Claude Code. Codex. Aider. OpenCode. Cursor's agent mode. Windsurf's Cascade. Devin. The list keeps growing. Each promises to be smarter, faster, more capable than the last.

Here's the thing: the agents are already good enough. Claude Code can refactor a module. Codex can write a test suite. Aider can iterate on a bugfix. The quality difference between top-tier agents is shrinking. What's not getting better is the workflow around them.

The real bottleneck isn't agent quality. It's agent quantity.

---

## The Single-Agent Ceiling

Most developers use AI coding agents the same way:

1. Open terminal
2. Describe a task
3. Watch the agent work
4. Review the output
5. Give feedback or merge
6. Repeat

This is sequential. One task at a time. The agent might take 5 minutes. Your review might take 10. That's one task per 15 minutes, four tasks per hour. For an individual developer, that's a solid productivity boost.

But your codebase has 50 parallelizable tasks right now. Writing tests for untested modules. Refactoring deprecated patterns. Updating stale documentation. Migrating config formats. Fixing lint warnings. Each task is independent. None requires the output of another.

Running them one at a time means 50 tasks at 15 minutes each: over 12 hours. Running 5 in parallel: under 3 hours. Running 10 in parallel: under 2 hours. The math is straightforward. The orchestration is the hard part.

---

## Why Orchestration Is Hard

### Isolation

Two agents in the same directory destroy each other's work. You need filesystem-level isolation — each agent in its own working directory with its own branch. Git worktrees solve this elegantly, but setting them up manually for every task is tedious.

### Session Management

Agents crash. Terminals close. Laptops go to sleep. If your agent's session dies, you lose the context and have to restart. For a single agent, this is annoying. For 10 agents, it's unmanageable.

### Review

The review workflow matters more with parallel agents, not less. You need to see what each agent changed, verify it against the task description, and decide whether to merge or iterate. Without a unified view across all tasks, you're tab-switching between terminals.

### Task Allocation

Not every task suits every agent. Claude Code handles complex refactors better than Codex. Codex is faster for well-defined tasks. Aider excels at iterative changes. An orchestrator lets you match agents to tasks instead of using one agent for everything.

---

## Orchestration vs More Agents

The AI coding industry is betting heavily on better agents. Smarter models, larger context windows, better tool use. These improvements are real and valuable.

But consider: a 20% improvement in agent quality (smarter code, fewer bugs) improves throughput by 20%. Running 5 agents in parallel instead of 1 improves throughput by 5x. The orchestration layer has more leverage than the agent layer.

This isn't an argument against better agents. It's an argument that the orchestration layer deserves equal attention. The best coding agent in the world, run one at a time, is slower than a good agent run ten at a time.

---

## What Good Orchestration Looks Like

### Automatic Isolation

Creating a task should automatically create a Git worktree and branch. No manual setup, no remembering to checkout a new branch, no worrying about file conflicts. The orchestrator handles this transparently.

### Session Persistence

Agent sessions should survive crashes, app restarts, and laptop sleep cycles. A background daemon that owns the sessions independently of the UI solves this — the same pattern that tmux uses for terminal multiplexing, applied to agent orchestration.

### Agent Agnosticism

Lock-in to a single agent is a strategic mistake. The AI landscape moves fast. Today's best agent might be tomorrow's second-best. An orchestrator should run any CLI-based agent — Claude Code, Codex, Aider, OpenCode, or whatever ships next week.

### Unified Review

All active tasks, their status, and their diffs should be visible in one place. When a task completes, the review workflow should be fast: see the diff, open in your editor if needed, merge or give feedback. Seconds per review, not minutes.

### Editor Integration

Developers have strong editor preferences. The orchestrator shouldn't force an editor choice. It should integrate with whatever you use — VS Code, Cursor, JetBrains, Xcode, Neovim — and let you open any worktree in your preferred environment.

---

## Building This at Superset

We built [Superset](https://superset.sh) because we hit this ceiling ourselves. We were running Claude Code for everything — and it was great for individual tasks. But scaling to 5-7 agents manually was operational overhead that distracted from the actual work.

The architecture is intentionally simple:

- **Git worktrees** for isolation (no containers, no VMs)
- **Persistent daemon** for session management (Unix domain sockets, survives crashes)
- **Any CLI agent** as a first-class citizen (no SDK integrations to maintain)
- **Built-in diff viewer** for fast review
- **Editor integration** for deep inspection (VS Code, Cursor, JetBrains, Xcode)

The orchestrator doesn't make agents smarter. It makes using agents at scale practical. That's the missing layer in most developers' AI workflows — not a better agent, but a better way to run the agents they already have.

---

## The Compound Effect

Running parallel agents has a compound effect on productivity:

1. **More tasks completed per day** — 5x agents means 5x throughput (minus review overhead)
2. **Faster iteration** — while one agent iterates on feedback, others are working on new tasks
3. **Better agent matching** — use the right agent for each task instead of one-size-fits-all
4. **Reduced context switching** — tasks run to completion in isolation instead of being stashed and resumed

The developers we work with who've adopted parallel agent workflows don't go back to single-agent work. The throughput difference is too large. The question shifts from "which agent should I use?" to "how many agents can I effectively manage?"

That's the right question. And the answer is: as many as your orchestrator supports and your review speed allows.
210 changes: 210 additions & 0 deletions apps/marketing/content/blog/parallel-coding-agents-guide.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
---
title: "The Complete Guide to Running Parallel AI Coding Agents"
description: "How to run multiple AI coding agents in parallel without conflicts. Covers isolation strategies, orchestration patterns, and practical workflows for scaling from 1 to 10+ agents."
author: avi
date: 2026-02-18
category: Engineering
relatedSlugs:
- working-with-worktrees-in-superset
- roadmap-to-100-agents
- git-worktrees-history-deep-dive
Comment on lines +7 to +10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same missing relatedSlugs concern as the companion article.

roadmap-to-100-agents and git-worktrees-history-deep-dive are referenced but not included in this PR. Please see the verification script in the companion file review above to confirm whether these posts already exist.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/marketing/content/blog/parallel-coding-agents-guide.mdx` around lines 7
- 10, The frontmatter key relatedSlugs contains references to posts that are not
included in this PR — specifically the entries "roadmap-to-100-agents" and
"git-worktrees-history-deep-dive" — so either remove those missing slugs from
relatedSlugs or add the corresponding posts to this PR; locate the relatedSlugs
array in the article (the relatedSlugs block) and either delete the two missing
slug strings or replace them with valid existing post slugs (or add the missing
post files to the branch) so the list only contains verifiable posts.

---

Running one AI coding agent is straightforward. Running ten at once introduces problems that most developers haven't encountered before: file conflicts, branch collisions, resource contention, and a review bottleneck that grows linearly with agent count. This guide covers the patterns that work and the mistakes to avoid.

---

## Why Parallel Agents?

A single coding agent handles one task at a time. You prompt, it works, you review, you iterate. At best, you're doing one unit of agent work per cycle.

But most codebases have dozens of parallelizable tasks at any given time:

- Writing tests for module A doesn't conflict with refactoring module B
- Updating API docs doesn't conflict with fixing a database query
- Adding input validation doesn't conflict with migrating a config format

These tasks are independent. They can — and should — run simultaneously. The bottleneck isn't the agent or the model. It's the human orchestrating one task at a time.

---

## The Isolation Problem

Running two agents in the same directory creates immediate problems:

```
Agent 1: Opens src/auth/login.ts, starts refactoring
Agent 2: Opens src/auth/login.ts, starts adding validation
Agent 1: Writes its version of login.ts
Agent 2: Writes its version of login.ts (overwrites Agent 1)
Agent 1: Commits — but the file is Agent 2's version
```

This isn't a theoretical risk. It happens every time two agents touch overlapping files. Even if they're editing different files, a shared `git index` means their commits can include each other's uncommitted changes.

### Solution: One Worktree Per Agent

Git worktrees solve this at the filesystem level:

```
~/project/ # Your main working tree
~/project-agent-tests/ # Agent 1: writing tests (own branch, own files)
~/project-agent-refactor/ # Agent 2: refactoring auth (own branch, own files)
~/project-agent-docs/ # Agent 3: updating docs (own branch, own files)
```

Each worktree has:
- Its own copy of the working directory
- Its own branch
- Its own staging area (git index)

They share the git object store, so creating a worktree takes seconds and costs minimal disk space.

---

## Orchestration Patterns

### Pattern 1: Manual Worktrees

The simplest approach — create worktrees yourself and run agents manually:

```bash
git worktree add ../project-tests -b agent/tests
cd ../project-tests
claude-code "Write unit tests for the auth module"
```

This works for 2-3 agents but breaks down at scale. You're managing worktrees, branches, and terminals manually.

### Pattern 2: Scripted Orchestration

A shell script that automates worktree creation and agent launching:

```bash
#!/bin/bash
TASK_NAME="$1"
AGENT_CMD="$2"
BRANCH="agent/$TASK_NAME"
WORKTREE_DIR="../$(basename $(pwd))-$TASK_NAME"

git worktree add "$WORKTREE_DIR" -b "$BRANCH"
cd "$WORKTREE_DIR"
eval "$AGENT_CMD"
```
Comment on lines +83 to +93
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Example script uses eval — document the risk or use a safer form.

eval "$AGENT_CMD" on line 92 is a classic shell injection vector. Even in illustrative code, readers commonly cargo-cult shell snippets directly into CI scripts or wrapper tools. A reader who feeds untrusted input into AGENT_CMD would get arbitrary command execution. Consider replacing eval with direct command invocation or adding an explicit warning:

📝 Suggested safer alternative
-eval "$AGENT_CMD"
+# Run the agent command directly (avoid eval for untrusted input)
+$AGENT_CMD

Or add a comment if eval is intentionally retained for compound commands:

+# WARNING: eval executes arbitrary shell code — ensure AGENT_CMD is trusted
 eval "$AGENT_CMD"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```bash
#!/bin/bash
TASK_NAME="$1"
AGENT_CMD="$2"
BRANCH="agent/$TASK_NAME"
WORKTREE_DIR="../$(basename $(pwd))-$TASK_NAME"
git worktree add "$WORKTREE_DIR" -b "$BRANCH"
cd "$WORKTREE_DIR"
eval "$AGENT_CMD"
```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/marketing/content/blog/parallel-coding-agents-guide.mdx` around lines 83
- 93, The script uses eval "$AGENT_CMD" which is a shell-injection risk; either
remove eval and invoke the command safely (e.g., parse AGENT_CMD into an array
and exec it using "${CMD[@]}" or accept command+args separately so you can run
them without eval) or, if you must accept compound commands, add an explicit
comment warning about injection and sanitize/validate TASK_NAME and AGENT_CMD
before use; update references around TASK_NAME, AGENT_CMD, BRANCH, WORKTREE_DIR
and replace the eval usage accordingly.


Better, but still no unified view of all tasks, no session persistence, and no diff review workflow.

### Pattern 3: Dedicated Orchestrator

Tools like [Superset](https://superset.sh) handle orchestration end-to-end:

1. **Task creation** — describe the task, pick an agent
2. **Automatic isolation** — worktree and branch created automatically
3. **Session management** — persistent daemon keeps sessions alive across crashes
4. **Diff review** — built-in diff viewer for reviewing agent output
5. **Editor integration** — open any worktree in VS Code, Cursor, JetBrains, or Xcode

This is the approach that scales to 5-10+ concurrent agents without operational overhead.

---

## Choosing the Right Agent Per Task

Different agents have different strengths. A parallel workflow lets you match agents to tasks:

### Claude Code
- Complex multi-file refactors
- Architectural changes (new patterns, module restructuring)
- Debugging subtle issues that require deep codebase understanding
- Tasks requiring MCP tool integration

### Codex CLI
- Well-scoped, clearly defined tasks
- Full Auto mode for autonomous execution
- Tasks where OpenAI models (o3, o4-mini) perform well
- Cost-sensitive tasks using cheaper models

### OpenCode
- Tasks where model flexibility matters (75+ providers)
- Cost optimization across different providers
- Teams running local models via Ollama
- Tasks that benefit from LSP integration

### Aider
- Iterative pair programming tasks
- Small, focused changes with tight feedback loops
- Tasks requiring frequent back-and-forth

The key insight: you don't have to choose one agent for everything. Run Claude Code on the complex refactor and Codex on the test generation simultaneously.

---

## The Review Bottleneck

Parallel agents create a new problem: review throughput. If you run 10 agents and each produces a diff in 15 minutes, you have 10 diffs to review per hour.

### Strategies for Fast Review

**Prioritize by risk:** A test addition is low-risk and can be reviewed quickly. A database migration is high-risk and needs careful review. Triage diffs by impact.

**Review diffs, not files:** Don't re-read the entire file. Focus on what changed. If the diff is scoped to what you asked for and the tests pass, a quick review is usually sufficient.

**Let agents verify their own work:** Before reviewing, check if the agent ran tests. If tests pass and the diff is clean, your review is a sanity check rather than a first-pass audit.

**Use structured task descriptions:** "Add input validation to the signup form: email must be valid, password must be 8+ characters, display inline errors" produces a reviewable diff. "Improve the signup form" produces something unpredictable.

**Batch similar tasks:** Review all test additions together, then all refactors. Context switching between different types of changes is the biggest time sink in review.

---

## Resource Management

### CPU and Memory

Each agent consumes local resources for:
- The agent process itself (terminal, file I/O)
- Language servers (TypeScript, Go, Python) if the agent uses them
- Build processes if the agent runs builds
- Test suites if the agent runs tests

On a modern laptop, 5-7 concurrent agents are comfortable. Beyond that, you may want to stagger agent launches or limit concurrent builds.

### API Rate Limits

Each agent makes API calls to its model provider. Running 10 Claude Code instances hits Anthropic's rate limits faster than one. Monitor your provider's rate limit headers and throttle if needed.

### Disk Space

Each worktree is a full checkout of your working directory. For a 1GB repo, 10 worktrees use ~10GB. The git object store is shared, so history isn't duplicated. Clean up completed worktrees promptly.

---

## Common Mistakes

### Running Too Many Agents at Once

Start with 2-3 until you're comfortable with the review workflow. Scaling to 10 before you can review at speed creates a backlog that slows everything down.

### Vague Task Descriptions

"Fix the bugs" will produce unpredictable results. "Fix the null pointer in UserService.getProfile when user.avatar is null — add a null check and return a default avatar URL" gives the agent exactly what it needs.

### Ignoring Branch Conflicts

If two agents modify the same file on different branches, you'll hit merge conflicts when merging. Plan your task allocation to minimize overlap. If tasks must touch the same files, run them sequentially.

### Not Running Tests

If the agent doesn't run tests, you're reviewing blind. Either include "run tests and fix any failures" in your prompt, or verify tests pass before reviewing the diff.

---

## Getting Started

1. **Pick an orchestrator** — [Superset](https://superset.sh) handles worktrees, sessions, and review
2. **Pick your agents** — Start with Claude Code or Codex, expand later
3. **Start with 2-3 tasks** — Parallelizable, non-overlapping tasks
4. **Review quickly** — Focus on diffs, not full file reads
5. **Scale gradually** — Add more agents as your review speed improves

The goal isn't to run the most agents possible. It's to maximize useful throughput — tasks completed per hour that meet your quality bar. Parallel agents get you there faster than sequential work, but only if the orchestration and review workflow supports it.
Loading
Loading