AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

Back to Overview Data & Analysis →

Academic Task Benchmarks: Sources

Primary Benchmark Sources

GPQA (Graduate-Level Science)

  1. GPQA: A Graduate-Level Google-Proof Q&A Benchmark
    • Type: Academic benchmark
    • Key Data: PhD-level science questions methodology
  2. Gemini 2.5 Technical Report
    • URL: https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf
    • Published: March 2025
    • Key Data: GPQA Diamond scores
  3. Gemini 3 Announcement
    • URL: https://blog.google/products/gemini/gemini-3/
    • Published: November 2025
    • Key Data: 93.8% GPQA Diamond

AIME (Mathematics)

  1. MAA American Invitational Mathematics Examination
    • URL: https://www.maa.org/math-competitions/aime
    • Type: Official benchmark source
  2. Gemini 3 Technical Details
    • URL: https://deepmind.google/models/gemini/
    • Key Data: 95% raw, 100% with code execution

Humanity’s Last Exam

  1. Humanity’s Last Exam Benchmark
    • Type: Crowdsourced frontier knowledge benchmark
    • Key Data: Questions at edge of human knowledge
  2. Gemini 2.5 Thinking Updates
    • URL: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
    • Published: March 2025
    • Key Data: 18.8% without tool use

Model Performance Reports

Google DeepMind

  1. Gemini 3 Deep Dive
    • URL: https://deepmind.google/models/gemini/pro/
    • Key Data: Academic benchmark details
  2. TechCrunch: Gemini 3 Benchmark Analysis
    • URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
    • Published: November 18, 2025

Anthropic

  1. Claude Opus 4.5 Capabilities
    • URL: https://www.anthropic.com/news/claude-opus-4-5
    • Published: November 2025
    • Key Data: Academic task performance

OpenAI

  1. GPT-5 Technical Report
    • Type: Model documentation
    • Key Data: Academic benchmark improvements

Educational Research

AI in Education

  1. Anthropic Education Report: University Students
    • URL: https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
    • Published: April 2025
    • Key Data: Student usage patterns
  2. Anthropic Education Report: Educators
    • URL: https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
    • Published: August 2025
    • Key Data: Faculty perspectives

Assessment Research

  1. Computing at School: Empowering Educators
    • URL: https://www.computingatschool.org.uk/forum-news-blogs/2025/april/empowering-educators-insights-from-anthropic-s-report-on-claude-s-role-in-higher-education/
    • Published: April 2025
  2. EdTech Innovation Hub: Faculty Use of Claude
    • URL: https://www.edtechinnovationhub.com/news/anthropic-analyzes-how-university-educators-use-claude-across-academic-tasks

Industry Analysis

  1. Data Studios: Claude 4 Performance Analysis
    • URL: https://www.datastudios.org/post/claude-4-in-2025-performance-safety-benchmarks-ecosystem-news-and-real-world-impact-for-enterpr
    • Key Data: Academic benchmark comparisons
  2. Skywork AI: Claude 4.5 Comparison
    • URL: https://skywork.ai/blog/claude-4-5-vs-other-ai-models-2025-comparison/
    • Key Data: Cross-model academic benchmarks
  3. Keepler: Google Gemini 3 Analysis
    • URL: https://keepler.io/2025/11/27/google-gemini-3-a-new-paradigm-in-frontier-ai/
    • Published: November 27, 2025

Academic Integrity Research

  1. Academic Integrity in the Age of AI
    • Type: Academic literature
    • Key Topics: Assessment redesign, detection challenges
  2. Higher Education AI Policy Frameworks
    • Type: Policy research
    • Key Topics: Institutional responses to AI capabilities

Source Categories

Category Count
Primary benchmarks 7
Model reports 4
Educational research 4
Industry analysis 3
Academic integrity 2
Total 20