While I appreciate this study, I'm also a bit worried its headline result is misleading—it only measures performance on a narrow set of software tasks. As of March 2025, AIs still can't handle 15-minute robotics or computer-use tasks, despite what the headline plot might suggest.
When will AI systems be able to carry out long projects independently?
— METR (@METR_Evals) March 19, 2025
In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months. pic.twitter.com/KuZrClmjcc
My interpretation of this study is that it supports the idea of a doubling time in effective horizon length, as I discussed here: https://t.co/8YdvAQMYfI However, beyond that, I'd take the specific numbers in their plot with a grain of salt.
Great results! Interestingly, this aligns somewhat well with predictions from a theoretical framework I proposed two years ago, which also suggested a periodic doubling time for effective horizon lengths—assuming exponential growth in compute and algorithmic progress. https://t.co/idkZNJpL21 pic.twitter.com/gzjZdZeMzx
— Matthew Barnett (@MatthewJBar) March 19, 2025