Great results! Interestingly, this aligns somewhat well with predictions from a theoretical framework I proposed two years ago, which also suggested a periodic doubling time for effective horizon lengths—assuming exponential growth in compute and algorithmic progress.
When will AI systems be able to carry out long projects independently?
— METR (@METR_Evals) March 19, 2025
In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months. pic.twitter.com/KuZrClmjcc

However, aligning my framework with these empirical findings implies exceptionally rapid algorithmic progress—with effective compute doubling roughly every 1.25 months. For context, @EpochAIResearch estimates that physical training compute currently doubles every ~5.4 months.

My original framework was purely theoretical and based on pretraining scaling laws, so it's interesting to see empirical validation of this functional relationship between compute and effective horizon length, even as we enter the reasoning model paradigm.
For more context, here's a blog post about my framework: https://t.co/r4O4yHP11T
Some clarification of how I interpret the METR study: https://t.co/oGyfPcTIz0
While I appreciate this study, I'm also a bit worried its headline result is misleading—it only measures performance on a narrow set of software tasks. As of March 2025, AIs still can't handle 15-minute robotics or computer-use tasks, despite what the headline plot might suggest. https://t.co/idkZNJpL21
— Matthew Barnett (@MatthewJBar) March 19, 2025