Anthropic Tested What Happens When AI Agents Trade Real Goods in a Marketplace

Anthropic, the company behind the Claude AI assistant, has published the results of an unusual internal experiment called Project Deal. The company set up a real marketplace where AI agents autonomously bought, sold, and negotiated personal belongings on behalf of 69 Anthropic employees. No humans were involved during the actual trading. The agents handled everything from posting listings to haggling over prices to closing deals, all through natural language conversations on Slack.

How the Experiment Worked

Each participant sat for an intake interview where Claude asked about items they wanted to sell, their price expectations, what they hoped to buy, and how aggressively they wanted their agent to negotiate. Claude then used those preferences to build a custom system prompt for each person's agent. The agents were deployed across four parallel marketplace runs on Slack, where they posted listings, made offers, countered, and struck deals entirely on their own. Over 500 items were listed, and the experiment produced 186 completed transactions worth just over 4000 dollars. Items ranged from snowboards and bicycles to lab-grown rubies and 19 ping pong balls.

Smarter Models Got Better Deals

The most striking finding involved the gap between AI models of different capability levels. Two of the four marketplace runs used a mix of Claude Opus 4.5 and Claude Haiku 4.5 agents. Opus agents completed significantly more deals, sold items for an average of 3.64 dollars more, and bought items for 2.45 dollars less. In one case, the same broken bicycle sold for 38 dollars with a Haiku agent but 65 dollars with an Opus agent. Critically, aggressive negotiation prompts made almost no difference. What mattered was the underlying intelligence of the model, not the instructions it received.

Perhaps the most uncomfortable result was that participants with weaker agents did not notice they were getting worse deals. Fairness ratings were nearly identical between Opus and Haiku users, and nearly half of Haiku users actually ranked their experience above the Opus runs. Anthropic says this raises serious questions about the future of AI-powered commerce: if smarter agents consistently outperform weaker ones but users cannot tell the difference, the result could be a new kind of quiet economic inequality that people on the losing end never realize exists.

Anthropic Tested What Happens When AI Agents Trade Real Goods in a Marketplace

Key Takeaways

How the Experiment Worked

Smarter Models Got Better Deals

Stay Informed