@MatthewJBar

5.78K 320 9.13K

Listen to this Thread


View original tweet on Twitter

Hide Media

While I appreciate this study, I'm also a bit worried its headline result is misleading—it only measures performance on a narrow set of software tasks. As of March 2025, AIs still can't handle 15-minute robotics or computer-use tasks, despite what the headline plot might suggest.

My interpretation of this study is that it supports the idea of a doubling time in effective horizon length, as I discussed here: https://t.co/8YdvAQMYfI However, beyond that, I'd take the specific numbers in their plot with a grain of salt.

While I appreciate this study, I'm also a bit worried its headline result is misleading—it only measures performance on a narrow set of software tasks. As of March 2025, AIs still can't handle 15-minute robotics or computer-use tasks, despite what the headline plot might suggest.My interpretation of this study is that it supports the idea of a doubling time in effective horizon length, as I discussed here: https://t.co/8YdvAQMYfI However, beyond that, I'd take the specific numbers in their plot with a grain of salt.

Unroll Another Tweet

Use Our Twitter Bot to Unroll a Thread

  1. 1 Give us a follow on Twitter. follow us
  2. 2 Drop a comment, mentioning us @unrollnow on the thread you want to Unroll.
  3. 3Wait For Some Time, We will reply to your comment with Unroll Link.