AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

Back to Overview Data & Analysis →

Real-World Task Performance: Sources

Primary Sources

OpenAI GDPval

  1. GDPval: Measuring the Performance of Our Models on Real-World Tasks
    • URL: https://openai.com/index/gdpval/
    • Published: September 2025
    • Type: Benchmark announcement and methodology
  2. GDPval Technical Paper
    • URL: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf
    • Type: Full methodology and results
  3. GDPval arXiv Paper
    • URL: https://arxiv.org/abs/2510.04374
    • Citation: arXiv:2510.04374
    • Type: Academic publication
  4. OpenAI Evals Platform
    • URL: https://evals.openai.com/
    • Type: Public evaluation service for GDPval gold set

News Coverage

Major Analysis

  1. Axios: ChatGPT GDPval AI Study
    • URL: https://www.axios.com/2025/09/25/chatgpt-gdp-val-ai-study
    • Published: September 25, 2025
    • Key Insight: AI catching up to human work
  2. Fortune: AI Models Are Getting Very Good at Professional Tasks
    • URL: https://fortune.com/2025/09/30/ai-models-are-already-as-good-as-experts-at-half-of-tasks-a-new-openai-benchmark-gdpval-suggests/
    • Published: September 30, 2025
    • Key Insight: ~50% expert parity achieved
  3. MarkTechPost: OpenAI Introduces GDPval
    • URL: https://www.marktechpost.com/2025/09/25/openai-introduces-gdpval-a-new-evaluation-suite-that-measures-ai-on-real-world-economically-valuable-tasks/
    • Published: September 25, 2025
    • Key Insight: Evaluation suite methodology overview

Anthropic

  1. Anthropic Transparency Hub: Model Report
    • URL: https://www.anthropic.com/transparency/model-report
    • Type: Model capabilities and benchmarks
  2. Claude Opus 4.5 Announcement
    • URL: https://www.anthropic.com/news/claude-opus-4-5
    • Published: November 2025
    • Key Data: GDPval comparative performance

Google DeepMind

  1. Gemini 3 Announcement
    • URL: https://blog.google/products/gemini/gemini-3/
    • Published: November 2025
    • Type: Model capabilities overview
  2. Gemini 3 Technical Report
    • URL: https://deepmind.google/models/gemini/
    • Type: Benchmark results

Economic Context

Productivity Research

  1. Microsoft Work Trend Index: AI Revolution Insights
    • URL: https://www.microsoft.com/en-us/industry/microsoft-in-business/future-of-work/2025/04/25/leading-the-ai-revolution-insights-from-microsofts-work-trend-index/
    • Published: April 25, 2025
    • Key Data: Workplace AI productivity impacts
  2. Microsoft: Unlocking AI’s Global Potential
    • URL: https://blogs.microsoft.com/on-the-issues/2025/04/10/unlocking-ai-global-potential/
    • Published: April 10, 2025
    • Key Data: Global productivity projections

Academic Research

Benchmark Methodology

  1. Microsoft Research: Predicting and Explaining AI Model Performance
    • URL: https://www.microsoft.com/en-us/research/blog/predicting-and-explaining-ai-model-performance-a-new-approach-to-evaluation/
    • Type: Evaluation methodology research
    • Key Contribution: ADeLe framework for AI assessment

Industry Analysis

  1. VentureBeat: Claude Opus 4.5 Performance Analysis
    • URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
    • Published: November 2025
    • Key Insight: Model comparison on professional tasks
  2. TechCrunch: Gemini 3 Benchmark Scores
    • URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
    • Published: November 18, 2025
    • Key Data: Cross-benchmark performance

Source Categories

Category Count
Primary benchmark data 4
News coverage 3
Model announcements 4
Economic research 2
Academic methodology 1
Industry analysis 2
Total 16