AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

← AI Impact Research


AI Capabilities Research: Frontier Labs Benchmarks & Trajectories (2025)

Overview

This repository contains comprehensive research on what AI systems can do — their performance on real-world tasks, improvement trajectories, and implications for higher education. The research covers frontier models from Anthropic, OpenAI, Google DeepMind, and Microsoft.

Audience: University stakeholders (administrators, faculty, curriculum designers, policy makers) adapting to rapidly changing AI capabilities.

Related Track: Looking for how people actually use AI (adoption, demographics, trends)? See AI Usage Research →

Time Period: 2024-2025 benchmark data with trajectory analysis


Why This Matters for Universities

AI capabilities are advancing faster than institutional adaptation cycles:

These developments have immediate implications for curriculum design, assessment integrity, research workflows, and workforce preparation.


Research Questions

01. Real-World Task Performance

Key Question: How well do AI models perform on authentic professional tasks across occupations?

Key Findings:

Significance: AI is approaching professional competence across diverse occupations


02. Capability Trajectory & Economic Impact

Key Question: How fast are AI capabilities improving, and what are the projected economic impacts?

Key Findings:

Significance: Capability improvement is accelerating, not plateauing


03. Academic Task Benchmarks

Key Question: How do models perform on academic/intellectual tasks relevant to universities?

Key Findings:

Significance: Graduate-level academic work is increasingly within AI capability range


04. Coding and Research Capabilities

Key Question: What can AI do in software development and research contexts?

Key Findings:

Significance: Coding education and CS curriculum require fundamental rethinking


05. Agentic Capabilities

Key Question: What autonomous, multi-step work can AI perform?

Key Findings:

Significance: AI transitioning from tool to autonomous agent


06. Safety and Alignment

Key Question: How robust and safe are current AI models?

Key Findings:

Significance: Relevant for AI ethics curricula and responsible deployment policies


07. Educational Implications

Key Question: What should universities do in response to these capabilities?

Recommendations:

Significance: Proactive adaptation required; reactive policies will lag capabilities


Key Statistics Summary

Cross-Lab Benchmark Comparison (November 2025)

Benchmark Claude Opus 4.5 GPT-5 Gemini 3 Measure
SWE-bench Verified 80.9% 77.9% 76.2% Real-world coding
GPQA Diamond TBD TBD 93.8% PhD-level science
AIME 2025 TBD TBD 95-100% Math competition
OSWorld 61.4% TBD TBD Computer use
GDPval Expert Parity Leading Strong TBD Professional tasks

Capability Improvement Rates

Transition Timeframe Improvement Benchmark
GPT-4o → GPT-5 14 months >2x GDPval
Gemini 2.5 → 3 7 months 50% Developer tasks
Claude 4.1 → 4.5 3 months 74.5% → 80.9% SWE-bench

Economic Projections

Source Projection Timeframe
McKinsey $4.4 trillion/year Potential annual impact
PWC $15.7 trillion Global contribution by 2030
Market size $243.72B → $826.73B 2025 → 2030

Research Structure

Each research question folder contains:

XX-research-question-name/
├── README.md      # Question overview, hypothesis, key findings
├── data.md        # Detailed data, analysis, implications
└── sources.md     # Comprehensive source list with links

Major Data Sources

Primary Research (2025)

  1. OpenAI GDPval (September 2025)
    • 1,320 tasks across 44 occupations
    • Open-sourced 220-task gold set
    • Public evaluation service at evals.openai.com
  2. Anthropic Model Reports & Transparency Hub
    • SWE-bench and OSWorld results
    • Safety evaluations and model cards
    • Internal engineering test comparisons
  3. Google DeepMind Gemini Reports
    • Gemini 2.5 and 3 technical reports
    • GPQA, AIME, Humanity’s Last Exam results
  4. Microsoft Research
    • ADeLe framework for AI evaluation
    • Work Trend Index productivity studies
    • Azure AI evaluation methodology

Third-Party Research

  1. METR - AI coding impact studies
  2. Stanford/BetterUp - Workplace productivity research
  3. McKinsey/PWC/IMF - Economic impact projections

Using This Research

For University Administrators

For Faculty

For Curriculum Designers

For Policy Makers


Cross-References with Usage Research

Capabilities RQ Related Usage RQ Connection
RQ05 (Agentic) Usage RQ01 (Automation) Automation patterns reflect agentic capabilities
RQ04 (Coding) Usage RQ02 (Platform Specialization) Claude’s coding dominance in usage matches benchmarks
RQ02 (Trajectory) Usage RQ06 (Use Case Evolution) Capability growth drives use case expansion

Data Gaps & Research Needs

Missing Data

  1. Longitudinal tracking: Need multi-year capability trajectories
  2. Domain-specific benchmarks: Limited data outside tech/STEM
  3. Educational outcome studies: Impact of AI on learning outcomes
  4. Comparative safety data: Standardized cross-lab safety benchmarks

Future Research Questions

  1. How do capability improvements translate to educational impact?
  2. What is the optimal human-AI collaboration model for learning?
  3. How should assessment evolve as capabilities advance?
  4. What skills remain durably valuable?

About This Research

Compiled: December 2025 Last Updated: 2025-12-10 Research Purpose: Inform university stakeholder decisions on AI adaptation Methodology: Synthesis of published benchmarks, technical reports, and economic analyses


Citation Recommendation

AI Capabilities Research: Frontier Labs Benchmarks & Trajectories (2025)
Research Questions: Real-World Tasks, Capability Trajectory, Academic Benchmarks,
Coding/Research, Agentic Capabilities, Safety/Alignment, Educational Implications
Data Sources: OpenAI GDPval, Anthropic Model Reports, Google DeepMind Gemini Reports,
Microsoft Research, METR, McKinsey, PWC, IMF
Compiled: December 2025

For specific findings, cite the original source (URLs provided in sources.md files).