This thread shares good results that match a theory I made two years ago, predicting how quickly AI skills improve. The theory suggests compute power could double every 1.25 months, much faster than the current 5.4 months. It's exciting to see real data support my ideas, especially as AI focuses more on reasoning. Check out my blog for more details.
Great results! Interestingly, this aligns somewhat well with predictions from a theoretical framework I proposed two years ago, which also suggested a periodic doubling time for effective horizon lengths—assuming exponential growth in compute and algorithmic progress.
@METR_Evals
When will AI systems be able to carry out long projects independently?
In new research, we find a kind of “Moore’s Law for AI agents”: the length of tasks that AIs can do is doubling about every 7 months. https://t.co/KuZrClmjcc
However, aligning my framework with these empirical findings implies exceptionally rapid algorithmic progress—with effective compute doubling roughly every 1.25 months. For context, @EpochAIResearch estimates that physical training compute currently doubles every ~5.4 months.
My original framework was purely theoretical and based on pretraining scaling laws, so it's interesting to see empirical validation of this functional relationship between compute and effective horizon length, even as we enter the reasoning model paradigm.
For more context, here's a blog post about my framework:
Some clarification of how I interpret the METR study:
@MatthewJBar
While I appreciate this study, I'm also a bit worried its headline result is misleading—it only measures performance on a narrow set of software tasks. As of March 2025, AIs still can't handle 15-minute robotics or computer-use tasks, despite what the headline plot might suggest.
Great results! Interestingly, this aligns somewhat well with predictions from a theoretical framework I proposed two years ago, which also suggested a periodic doubling time for effective horizon lengths—assuming exponential growth in compute and algorithmic progress.However, aligning my framework with these empirical findings implies exceptionally rapid algorithmic progress—with effective compute doubling roughly every 1.25 months. For context, @EpochAIResearch estimates that physical training compute currently doubles every ~5.4 months.My original framework was purely theoretical and based on pretraining scaling laws, so it's interesting to see empirical validation of this functional relationship between compute and effective horizon length, even as we enter the reasoning model paradigm.For more context, here's a blog post about my framework:Some clarification of how I interpret the METR study:
yes
Great results! Interestingly, this aligns somewhat well with predictions from a theoretical framework I proposed two years ago, which also suggested a periodic doubling time for effective horizon lengths—assuming exponential growth in compute and algorithmic progress. ... However, aligning my framework with these empirical findings implies exceptionally rapid algorithmic progress—with effective compute doubling roughly every 1.25 months. For context, @EpochAIResearch estimates that physical training compute currently doubles every ~5.4 months. ... My original framework was purely theoretical and based on pretraining scaling laws, so it's interesting to see empirical validation of this functional relationship between compute and effective horizon length, even as we enter the reasoning model paradigm. ... For more context, here's a blog post about my framework: ... Some clarification of how I interpret the METR study:
Missing some Tweet in this thread? You can try to
Update
More Threads by @MatthewJBar
This thread says the study's main result might be misleading because it only tests AIs on a few tasks. As of March 2025,...