
Summary
METR researcher Nikola Jurkovic tasked Anthropic's Opus 4.6 with reimplementing Slay the Spire and Balatro as CLI games using a simple ReAct scaffold with internet access and 60 million tokens. The model produced mostly playable versions of both games in single runs, with recognizable core mechanics intact despite missing features and edge-case bugs. The researcher estimated these tasks would take an experienced software engineer several months to complete.
Read Original Article →Related Signals
Signal Graph
Second Order
Organizations benchmarking AI coding capability against human developer output need to revise their timelines upward — a model producing months-equivalent engineering work in a single agentic run signals that the gap between AI-assisted and AI-autonomous software development is closing faster than most roadmaps assume. Teams that have structured AI adoption around the assumption of narrow, supervised code generation will find that assumption structurally outdated within the current product cycle.
Third Order
As agentic models routinely compress multi-month engineering tasks into single runs, the economic and organizational rationale for large software development teams erodes — not gradually, but in discrete capability jumps that will outpace workforce transition planning. The scarcity that shifts is not coding labor but task specification and evaluation expertise: organizations that cannot clearly define and score complex outputs will be unable to leverage the capability they nominally have access to. This also accelerates a winnowing of the software consultancy and professional services market, where billable hours have historically been anchored to implementation complexity.