Anthropic ran 16 Claude agents to build a Rust C compiler for Linux kernel builds
Anthropic researcher Nicholas Carlini used 16 Claude agents over two weeks and 20,000 dollars in API fees to produce a 100,000-line Rust C compiler that can build a Linux 6.9 kernel....

Key Takeaways
- Anthropic used 16 Claude Opus 4.6 agents for about two weeks and 20,000 dollars in API fees to generate a 100,000-line Rust C compiler.
- The compiler reportedly built a bootable Linux 6.9 kernel on x86, ARM, and RISC-V and reached a 99 percent pass rate on GCC’s torture tests.
- Most of the “autonomy” depended on human-built automation: CI, context-safe test output, time-boxing, and GCC-as-oracle debugging.
- Limitations remain: reliance on GCC for a 16-bit step, buggy assembler/linker, and weaker codegen efficiency than GCC.
Anthropic is using multi-agent coding to test how far modern AI can go without constant pair-programming, and the latest demo is unusually concrete: a working C compiler created by a team of autonomous model instances sharing a repo.
How 16 Claude agents coordinated through Git
Anthropic researcher Nicholas Carlini described running 16 instances of Claude Opus 4.6 inside separate Docker containers, each cloning a shared Git repository, “claiming” tasks via lock files, and pushing code back when done. There was no central orchestrator; agents self-selected the next problem and even handled merge conflicts.
The effort spanned roughly two weeks, nearly 2,000 Claude Code sessions, and about 20,000 dollars in API costs. The output: a Rust-based compiler around 100,000 lines long that can compile a bootable Linux 6.9 kernel across x86, ARM, and RISC-V. Anthropic also says it compiles major OSS projects including PostgreSQL, SQLite, Redis, FFmpeg, and QEMU, and it reportedly hit a 99 percent pass rate on the GCC torture test suite.
What the experiment says about agentic automation limits
For marketers and operators watching agentic tooling, the most useful details are the constraints. Carlini notes the project benefited from unusually strong “verifiers”: mature specs, existing test suites, and a reference compiler. He also had to build scaffolding that looks a lot like automation: CI pipelines, test harnesses, and feedback loops tuned to LLM failure modes.
Examples: verbose test logs overwhelmed context windows, so outputs were summarized; agents could burn hours “doing work” without progress, so runs were time-boxed; and when agents converged on the same kernel issue, GCC was used as an oracle by compiling most files with GCC and only a subset with the new compiler to parallelize debugging.
The caveats are non-trivial: it still calls out to GCC for a 16-bit x86 step, its assembler/linker are buggy, and even with optimizations it produces less efficient code than GCC with optimizations off. Carlini also warned that deploying code humans haven’t verified is risky—especially as coherence breaks down as codebases grow.
Conclusion: the headline is a compiler, but the business lesson is process—agent teams can ship real artifacts, yet they still need tight verification, careful observability, and human-designed guardrails to avoid confidently building the wrong thing.
Stay Informed
Weekly AI marketing insights
Join 5,000+ marketers. Unsubscribe anytime.
