Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions apps/marketing/content/blog/roadmap-to-100-agents.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
title: "Our plan for running 100 Parallel Coding Agents"
description: "An attempt to crystallize our plans for 2026"
author: satya
date: 2026-02-02
category: Product
---

![Superset — managing coding agents in parallel](/blog/roadmap-to-100-agents/cover.png)

Right now at Superset, we're able to reliably manage 5-7 coding agents in parallel - whether that's Claude Code, Codex,
etc. - at a time. Our goal is to be able to manage 100 coding agents in parallel each by the end of 2026.

Most people believe that the path from seven to 100 agents is better models, faster inference, and smarter agents. It's
not. Agent compute is already cheap enough, you can run hundreds of agents a month all for less than the cost of one
engineer.

What's stopping us is every agent needs a human to review its code, give feedback, and decide what to work on next.
Scale the agents all you want - it's the humans that don't scale.
Comment on lines +11 to +19
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten two grammar points in the intro.

✏️ Suggested wording tweaks
-Right now at Superset, we're able to reliably manage 5-7 coding agents in parallel - whether that's Claude Code, Codex,
-etc. - at a time. Our goal is to be able to manage 100 coding agents in parallel each by the end of 2026.
+Right now at Superset, we're able to reliably manage 5-7 coding agents in parallel - whether that's Claude Code, Codex,
+etc. - at a time. Our goal is to be able to manage 100 coding agents in parallel by the end of 2026.
-Scale the agents all you want - it's the humans that don't scale.
+Scale the agents all you want - it's the humans who don't scale.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Right now at Superset, we're able to reliably manage 5-7 coding agents in parallel - whether that's Claude Code, Codex,
etc. - at a time. Our goal is to be able to manage 100 coding agents in parallel each by the end of 2026.
Most people believe that the path from seven to 100 agents is better models, faster inference, and smarter agents. It's
not. Agent compute is already cheap enough, you can run hundreds of agents a month all for less than the cost of one
engineer.
What's stopping us is every agent needs a human to review its code, give feedback, and decide what to work on next.
Scale the agents all you want - it's the humans that don't scale.
Right now at Superset, we're able to reliably manage 5-7 coding agents in parallel - whether that's Claude Code, Codex,
etc. - at a time. Our goal is to be able to manage 100 coding agents in parallel by the end of 2026.
Most people believe that the path from seven to 100 agents is better models, faster inference, and smarter agents. It's
not. Agent compute is already cheap enough, you can run hundreds of agents a month all for less than the cost of one
engineer.
What's stopping us is every agent needs a human to review its code, give feedback, and decide what to work on next.
Scale the agents all you want - it's the humans who don't scale.
🧰 Tools
🪛 LanguageTool

[style] ~18-~18: Try using a synonym here to strengthen your writing.
Context: ...agent needs a human to review its code, give feedback, and decide what to work on ne...

(GIVE_PROVIDE)


[style] ~19-~19: Consider using “who” when you are referring to people instead of objects.
Context: ...e agents all you want - it's the humans that don't scale. # Mapping out the problem...

(THAT_WHO)

🤖 Prompt for AI Agents
In `@apps/marketing/content/blog/roadmap-to-100-agents.mdx` around lines 11 - 19,
Tighten two grammar points in the intro: replace the sentence "Right now at
Superset, we're able to reliably manage 5-7 coding agents in parallel - whether
that's Claude Code, Codex, etc. - at a time." with a cleaner version like "Right
now at Superset, we can reliably manage 5–7 coding agents in parallel — whether
Claude Code, Codex, or similar." and change "Our goal is to be able to manage
100 coding agents in parallel each by the end of 2026." to "Our goal is to
manage 100 coding agents in parallel by the end of 2026." to remove redundancy
and tighten phrasing.


# Mapping out the problem

You can imagine the agent loop as a pipeline, and the goal is to improve throughput:

![The agent pipeline — most steps require a human](/blog/roadmap-to-100-agents/pipeline-diagram.svg)

There's a clear bottleneck that emerges when you look at this. A human is involved in almost every step, and each of
these steps has a steep context-switching cost - you have to open that agent's code, spin up dev servers, click through
the UI to verify their work, give feedback and more. Right now, most of our agents spend more time waiting for us to
review their work than they spend doing it.

At 100 agents, this model completely breaks. You can't review 100 diffs a day. You can't context-switch between 100
streams of work.

The fix is straightforward: pull the human out of steps where they're not needed, and make the remaining steps faster.

# How we'll improve it

## Have agents work harder before reaching out to you

If you've worked with a coding agent, you've had the experience: the agent comes back with something half-baked, you
spend 15 minutes catching up to what it did, spin up a dev server, click around, then feed it the same feedback you've
given a dozen agents before. Most of the time you spend reviewing isn't making decisions — it's catching problems that
should have been caught before the work reached you.

The fix is adding layers between the agent and you. The agent's work should be vetted thoroughly before it ever is
presented to you.

![Agent work passes through review layers before reaching you](/blog/roadmap-to-100-agents/quality-gates.svg)

### Adversarial agents

[Block published a paper recently](https://block.xyz/documents/adversarial-cooperation-in-code-synthesis.pdf) that
highlights how useful having agents work together can be. The general idea is that they send two agents on tasks, one to
implement the task at hand, and the other to enforce the implementer to write tests, review their work, and do due
diligence before picking a solution.

A similar pattern can be used to reduce interruptions for you: you could have a dedicated bouncer agent that sits
between the coding agents and you, preventing agents from surfacing its work until it's sure the agent is either done
with its work or is sufficiently stuck. Your review becomes a final sign-off, not a first pass.

### Stacking review agents and automated testing

Since you don't care how long an agent takes when you're running dozens of them, there's no downside to stacking checks.
Run five different review agents, each looking for different classes of issues, with a final agent consolidating the
feedback. Each layer increases the odds that problems are found and resolved before you ever see the code.

The same logic applies to testing. Giving agents access to the browser through tools like
[BrowserUse](https://browser-use.com/) or [Maestro](https://maestro.mobile.dev/) tests lets them verify their own work
visually — catching UI regressions, layout issues, and interaction bugs that are invisible in code review alone.

### Long-running agents

Most agent workflows today are one-shot: you give a task, the agent works, it comes back. But agents should be able to
run longer loops — trying an approach, hitting an issue, adjusting, and iterating until they're confident or genuinely
stuck. [Ralph loops](https://ghuntley.com/loop/) are a popular pattern for this: treat the agent's work as clay on a
wheel, refining iteratively, rather than laying bricks in a line.

The result is fewer interruptions and higher-quality output when the agent does surface. An agent that's been iterating
for an hour and is confident in its solution is far easier to review than one that gave up after its first attempt.

## Make it fast to review agents' work

Most developer tools today are human-driven — you open a diff, you spin up a dev server, you navigate to the right page.
Agents plug into these tools, but the human is still doing the legwork. We want to shift the paradigm towards
agent-driven UIs - interfaces that agents orchestrate for the human's benefit, where each review takes seconds, not
minutes.

### Investing in agent-driven UIs

When you review an agent's work today, you're dropped into a diff with no context. You have to reconstruct what the
agent was trying to do, spin up an environment to test it, and navigate to the right pages to verify. That's the agent
dumping its work on your desk.

In an agent-driven UI, the agent prepares your review for you. It writes a summary of what changed and why, spins up a
preview environment, navigates you to the specific pages or flows it wants you to look at, and surfaces the test results
that matter. When you open a completed task, you should be looking at a prepared briefing, not raw output.

### Make existing tools better

PR reviews, CI dashboards, IDEs — these are all built for a world where humans drive the interactions. In an agent-first
world, the tools need to meet you differently. Agents should be annotating their own PRs before you open them, the way
[Devin's review](https://app.devin.ai/review) adds context to diffs ahead of time. CI results should be summarized and
triaged by an agent, not presented as a raw log for you to parse. The tools we use every day were designed for human
authors — adapting them for human reviewers of agent work is a different design problem.

### Reducing friction to zero

Every interaction between you and an agent should be as lightweight as possible. You should be able to click yes or no
for straightforward changes. Agents should prep multiple-choice questions — "I found three approaches to this, which do
you prefer?" — so you're choosing instead of typing. When an agent does need written feedback, supporting agents can
prefill a draft response based on the context, so you're editing instead of writing from scratch. Quick actions like
"create PR" or "deploy to staging" should also be easy to reach.

The goal isn't just faster review — it's making the interaction so lightweight that you can do it from your phone
between meetings.

## Have agents be more proactive

![Events trigger agents automatically](/blog/roadmap-to-100-agents/proactive-agents.svg)

Everything above assumes you're the one deciding what agents work on. But at 100 agents, planning is itself a
bottleneck. You can't spec out 100 tasks a day — that requires understanding the codebase, the product priorities, and
the nuances of each task.

### Reusable workflows

The building blocks for this are already emerging. [OpenAI's Codex skills](https://developers.openai.com/codex/skills/)
let you package repeatable workflows — deploy procedures, migration steps, test patterns — as reusable bundles that
agents can invoke on their own when the situation matches. Instead of writing the same instructions every time, you
encode them once and the agent recognizes when to apply them.

### Event-driven triggers

[Devin's workflows](https://devin.ai/) take this further with event-driven triggers. A build fails, and a Devin instance
spins up to investigate. A Linear ticket is created, and an agent starts working on it automatically. Teams create
playbooks for recurring tasks — setting up changelogs, running code migrations, adding test coverage — that agents
execute on a schedule or in response to events without anyone initiating them.

### Beyond code

Even outside of code, this pattern is taking hold. [Circleback](https://circleback.ai/) listens to your meetings and
doesn't just take notes — it extracts action items, creates Linear tickets for feature requests mentioned in product
demos, and updates your CRM after sales calls. The meeting ends and the downstream work is already in motion.

We don't have all of this figured out yet. Some of it is live, some is on our roadmap, and some is still taking shape.
But the throughput framing gives us a clear test for every feature we build: does this reduce the time a human spends
per agent interaction?

If you're running agents at scale and hitting these walls, we'd love to compare notes, reach out to us at founders@superset.sh
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing period and consider mailto link for email.

The closing sentence is missing terminal punctuation and the email could be a clickable link for better UX.

✏️ Suggested fix
-If you're running agents at scale and hitting these walls, we'd love to compare notes, reach out to us at founders@superset.sh
+If you're running agents at scale and hitting these walls, we'd love to compare notes. Reach out to us at [founders@superset.sh](mailto:founders@superset.sh).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
If you're running agents at scale and hitting these walls, we'd love to compare notes, reach out to us at founders@superset.sh
If you're running agents at scale and hitting these walls, we'd love to compare notes. Reach out to us at [founders@superset.sh](mailto:founders@superset.sh).
🤖 Prompt for AI Agents
In `@apps/marketing/content/blog/roadmap-to-100-agents.mdx` at line 150, The
closing sentence "If you're running agents at scale and hitting these walls,
we'd love to compare notes, reach out to us at founders@superset.sh" is missing
terminal punctuation and should make the email a clickable mailto link; update
that sentence to end with a period and replace the plain email with a mailto
link (e.g., [founders@superset.sh](mailto:founders@superset.sh)) so it becomes
properly punctuated and the email is clickable in the MDX.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
63 changes: 63 additions & 0 deletions apps/marketing/public/blog/roadmap-to-100-agents/quality-gates.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading