Skip to main content
Class/Seminar

Joel Becker | Reconciling Impressive AI Benchmark Performance with Limited Developer Productivity Impacts

Sponsored by

This event is over.

Event Details:

AI coding agents now complete multi-hour coding benchmarks with roughly 50% reliability, yet a randomized trial found experienced open-source developers took about 19% longer when allowed frontier AI tools than when tools were disallowed. This talk presents the evidence on the productivity paradox in AI coding, shows the bottlenecks in deployment, and outlines the next steps for understanding AI’s productivity impacts.

Joel Becker works on AI evaluation methods at METR such as time horizon and developer productivity RCTs. Previously he worked in economics and genomics research, ran a statistics consultancy advising professional soccer teams, and was a very minorly successful play-money prediction markets trader.

Link to time horizon paper

Link to developer productivity paper

Location: