Elliot Glazer
Elliot Glazer
@ElliotGlazer
Dec 27 4 months ago 9 tweets Read on X

1/9 We’re announcing the development of Tier 4, a new suite of math problems that go beyond the hardest problems in FrontierMath. o3’s performance is remarkable, but there’s still a ways to go before any single AI system nears the collective genius of the math community.

2/9 For context, FrontierMath currently spans three broad tiers:
• T1 (25%) Advanced, near top-tier undergrad/IMO
• T2 (50%) Needs serious grad-level background
• T3 (25%) Research problems demanding relevant research experience
All can take hours—or days—for experts to solve.

3/9 Although o3 solved problems in all three tiers, it likely still struggles on the most formidable Tier 3 tasks—those “exceptionally hard” challenges that Tao and Gowers say can stump even top mathematicians.

4/9 Tier 4 aims to push the boundary even further. We want to assemble problems so challenging that solving them would demonstrate capabilities on par with an entire top mathematics department.

5/9 Each problem will be composed by a team of 1-3 mathematicians specialized in the same field over a 6-week period, with weekly opportunities to discuss ideas with teams in related fields. We seek broad coverage of mathematics and want all major subfields represented in Tier 4.

6/9 Process for a Tier 4 problem:

1. 1 week crafting a robust problem concept, which “converts” research insights into a closed-answer problem.

2. 3 weeks of collaborative research.
Presentations among related teams for feedback.

3. Two weeks for the final submission.

7/9 We’re seeking mathematicians who can craft these next-level challenges. If you have research-grade ideas that transcend T3 difficulty, please email elliot.ai with your CV and a brief note on your interests.

8/9 We’ll also hire some red-teamers, tasked with finding clever ways a model can circumvent a problem’s intended difficulty, and some reviewers to check for mathematical correctness of final submissions. Contact me if you think you’re suitable for either such role.

9/9 As AI keeps improving, we need benchmarks that reflect genuine mathematical depth. Tier 4 is our next (and possibly final) step in that direction.

Missing some Tweet in this thread? You can try to Update

More Threads by @ElliotGlazer

3 tweets • 18 days ago
Read Thread
9 tweets • 4 months ago
Read Thread
A creator teamed up with a TAS expert to make what they say is the hardest video game ever. They’ll share the video an...
2 tweets • 6 months ago
Read Thread
This thread explores open questions about pi, like whether it's normal and how digits repeat in different bases. The aut...
16 tweets • 9 months ago
Read Thread

Unroll Another Thread

Convert any Twitter threads to an easy-to-read article instantly

Have you tried our Twitter bot?

You can now unroll any thread without leaving Twitter/X. Here's how to use our Twitter bot to do it.

  • Give us a follow on Twitter. follow us
  • Drop a comment, mentioning us @unrollnow on the thread you want to Unroll.
  • Wait For Some Time, We will reply to your comment with Unroll Link.
UnrollNow Twitter Bot
Modal Image
0:00 / 0:00