AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

← AI Impact Research · AI Capabilities Research


Academic Task Benchmarks

Research Question

How do AI models perform on academic and intellectual tasks relevant to universities?

Hypothesis

AI models are achieving expert-level performance on many academic tasks, requiring universities to reconsider assessment methods, learning objectives, and the nature of expertise development.


Key Findings

1. Graduate-Level Science (GPQA Diamond)

Benchmark: PhD-level questions in physics, biology, and chemistry (Google)

2. Mathematics (AIME 2025)

Benchmark: American Invitational Mathematics Examination (Google)

3. Frontier Knowledge (Humanity’s Last Exam)

Benchmark: Questions at the edge of human knowledge (Google)

4. Undergraduate Coursework

Estimated AI Performance by Subject:


Implications for Universities

Assessment Redesign

Learning Objectives

Course Design



Explore This Research


← Previous: Capability Trajectory Next: Coding & Research →