The Agent Org Chart Was Wrong, But So Was I

Feb 06, 2026

Six months ago, AI strategy slides all looked the same. Take your org chart. Replace “analyst” with “AI agent.” Replace “developer” with “AI agent.” Draw the same boxes and reporting lines. Synthetic workers slotting into human-shaped holes.

I pushed back hard on this. LLMs are two-sigma technologies - brilliant and unreliable in unpredictable ways. They can’t maintain consistent personas across conversations. They forget context. They hallucinate confidently. The idea of an “AI orchestrator” coordinating with an “AI architect” who briefs an “AI analyst” assumed a coherence that the models simply don’t have. To me, the org chart metaphor was just magical thinking dressed up as strategy.

I still think my reasoning was right, but I was wrong about what it implied. Agents couldn’t maintain coherent human-like roles, so they found a totally different path to capability.

The Swarms Have Arrived

In January and February 2026, three projects landed in quick succession. Cursor ran 2,000 AI agents for a week to build a web browser from scratch in 30,000 commits, millions of lines of code, and no human touching the keyboard. Anthropic had 16 parallel Claude instances produce a C compiler that boots Linux. Steve Yegge launched Gas Town, an orchestration system where dozens of agents self-organize like a dystopian city-state.

These projects point to something very different from agents playing human roles.

Anthropic’s entire orchestration layer was a loop: spawn an agent, let it work until it stops, spawn another. No memory between sessions. No consistent identity. Each fresh instance oriented itself by reading what previous agents had left behind in the repo. Rather than a coordinating team, it was a rolling wave of Memento-like amnesiacs reading each other’s notes.

Cursor’s system tried human-style coordination first, but it failed. File locking made agents timid. Optimistic concurrency made them risk-averse. They did land on a hierarchy of Planners, Workers, and Judges, but even that looked absolutely nothing like a human team. The system *tolerated a stable error rate* to maintain throughput. A large portion of the work output went straight in the bin with each step. That was fine. More agents, more attempts, more progress.

Gas Town is stranger still. Agents crash, stall, corrupt data. The entire architecture assumes failure. “If there is work on your hook, YOU MUST RUN IT” is the core rule - a principle that only makes sense if you expect the previous worker to have died mid-task. There’s even a necromantic command called “Séance” that lets agents query their predecessors’ sessions.

Brute Force as Strategy

The pattern across all three projects is the same: brute force plus verification beats complex coordination.

Anthropic used an existing compiler as an “oracle” - compile most files with the known-good compiler, use the experimental one for a random subset, then delta-debug when things break. Cursor used Rust’s strict type system as a verification layer and GPT’s vision capabilities to compare screenshots against working examples.

None of this looks like how humans do work. It looks more like evolution. Generate variants, kill the failures, and keep only what survives.

The economics make this viable. Anthropic’s C compiler cost roughly $20,000 in API tokens. A human team building an equivalent might take years and millions in salaries. The agents’ code is often unwieldy, over-engineered, structurally incoherent - one developer called Cursor’s browser output “incredibly bloated” compared to mature engines. But it compiles, it runs, and it passes the tests.

The Ralph Wiggum technique, named after the Simpsons character, captures the philosophy perfectly: feed Claude’s output back into itself in a loop until it works. Fresh context each time with no memory. Just persistence, iteration, and evidence from the file system. Although we should take this with a grain of salt, and it doesn’t count all the completely failed attempts, one engineer reportedly completed a $50,000 contract for $297 in API costs using this approach.

Prior Art All the Way Down

But what are these agents actually doing?!

They’re not necessarily inventing - more re-combining. A C compiler follows well-documented specifications. A web browser renders against published CSS and HTML standards. The agents can succeed because they’re operating in domains with extensive prior art, clear verification criteria, and decades of human solutions to learn from.

Human work is also mostly recombination. We just flatter ourselves that it isn’t. Agents just make all the scaffolding visible: specifications in, working code out, creativity optional. That’s quite uncomfortable because it suggests a lot of what we call expertise is pattern-matching at scale - and we just got outscaled.

We use style guides and templates, follow prior examples from colleagues for patterns, and (sometimes) read documentation if there is any. The difference is that agents can do this at scale, in parallel, around the clock, for pennies per attempt. If the task has clear success criteria and abundant examples, the swarm can grind through it faster than any human team.

Living With Aliens

Andrej Karpathy described this in way that really resonates with me: “Some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it.”

These agents aren’t AI coworkers in the way those decks imagined. They don’t think like us. They don’t plan then execute flawlessly. They iterate, fail, read the wreckage, and iterate again. They produce code through approaches no human would choose, at costs (lines of code, not dollars) no human team could match. Watching Gas Town work has been described as “like watching ants building a bridge”.

So the agent deck authors weren’t entirely wrong. They identified that agents would do real work. But they assumed that work would look like human work, fit into human processes, and follow human logic.

It doesn’t!

So, what to do?

The practical implications cut deep.

First, if you’re still thinking about AI as “faster employees,” you’re still deluded. These systems succeed through parallelism and verification, not through coherent individual performance. You’re more likely to need to design for swarms than for substitutes.

Second, the “how” matters way less than the “what” in these examples. Agent-produced code is weird. It’s over-engineered. It takes paths no human would choose. But if it passes your tests - if it meets your specifications - does the journey matter? In some cases the answer is categorically “yes!” This requires real reflection from us human leaders on what we, our clients, and communities (and regulators!) really value in our work product versus what we’ve historically used as proxies for quality.

Third, verification infrastructure is now strategic. The projects that have succeeded (under some definition at least) had rigorous automated testing, type systems, oracles, and golden examples. The quality of your verification determines how much agent work you can absorb and, as dystopian as that feels, it feels like we should start to invest accordingly.

Fourth, tolerance for variance might become a new competitive advantage. Human teams produce consistent, predictable, explicable work. Agent swarms produce inconsistent, unpredictable, alien work that nonetheless accumulates toward solutions. Organizations that can accept this variance might capture some of the value that their competitors can’t.

Working for a leading consulting firm I have seen an analogy of this with clients. I was often hired precisely when tolerance for variance was low or zero because of a binding constraint (financial or regulatory) but cheaper competitors got the gig when all that was needed was week-by-week continual forward progress.

The trajectory from “barely functional” to “it boots up Linux” happened in months. We’re not at the end of this curve and all we can do is experiment relentlessly, stay on top of what’s becoming possible, and think hard about where these alien workflows can be adapted to the problems we care about.

The agentic org chart was the wrong metaphor, but something is filling those boxes.

It just doesn’t look like us.

Sometimes Models Just Do Things

Discussion about this post

Ready for more?