Anthropic ran 16 Claude agents to build a Rust C compiler for Linux kernel builds

Anthropic is using multi-agent coding to test how far modern AI can go without constant pair-programming, and the latest demo is unusually concrete: a working C compiler created by a team of autonomous model instances sharing a repo.

How 16 Claude agents coordinated through Git

Anthropic researcher Nicholas Carlini described running 16 instances of Claude Opus 4.6 inside separate Docker containers, each cloning a shared Git repository, “claiming” tasks via lock files, and pushing code back when done. There was no central orchestrator; agents self-selected the next problem and even handled merge conflicts.

The effort spanned roughly two weeks, nearly 2,000 Claude Code sessions, and about 20,000 dollars in API costs. The output: a Rust-based compiler around 100,000 lines long that can compile a bootable Linux 6.9 kernel across x86, ARM, and RISC-V. Anthropic also says it compiles major OSS projects including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU, and it reportedly hit a 99 percent pass rate on the GCC torture test suite.

What the experiment says about agentic automation limits

For marketers and operators watching agentic tooling, the most useful details are the constraints. Carlini notes the project benefited from unusually strong “verifiers”: mature specs, existing test suites, and a reference compiler. He also had to build scaffolding that looks a lot like automation: CI pipelines, test harnesses, and feedback loops tuned to LLM failure modes.

Examples: verbose test logs overwhelmed context windows, so outputs were summarized; agents could burn hours “doing work” without progress, so runs were time-boxed; and when agents converged on the same kernel issue, GCC was used as an oracle by compiling most files with GCC and only a subset with the new compiler to parallelize debugging.

The caveats are non-trivial: it still calls out to GCC for a 16-bit x86 step, its assembler/linker are buggy, and even with optimizations it produces less efficient code than GCC with optimizations off. Carlini also warned that deploying code humans haven’t verified is risky—especially as coherence breaks down as codebases grow.

Conclusion: the headline is a compiler, but the business lesson is process—agent teams can ship real artifacts, yet they still need tight verification, careful observability, and human-designed guardrails to avoid confidently building the wrong thing.

Anthropic ran 16 Claude agents to build a Rust C compiler for Linux kernel builds

Key Takeaways

How 16 Claude agents coordinated through Git

What the experiment says about agentic automation limits

Stay Informed

Related Topics