Real-World Task Performance: Sources

Primary Sources

GDPval: Measuring the Performance of Our Models on Real-World Tasks
- URL: https://openai.com/index/gdpval/
- Published: September 2025
- Type: Benchmark announcement and methodology
GDPval Technical Paper
- URL: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf
- Type: Full methodology and results
GDPval arXiv Paper
- URL: https://arxiv.org/abs/2510.04374
- Citation: arXiv:2510.04374
- Type: Academic publication
OpenAI Evals Platform
- URL: https://evals.openai.com/
- Type: Public evaluation service for GDPval gold set

Axios: ChatGPT GDPval AI Study
- URL: https://www.axios.com/2025/09/25/chatgpt-gdp-val-ai-study
- Published: September 25, 2025
- Key Insight: AI catching up to human work
Fortune: AI Models Are Getting Very Good at Professional Tasks
- URL: https://fortune.com/2025/09/30/ai-models-are-already-as-good-as-experts-at-half-of-tasks-a-new-openai-benchmark-gdpval-suggests/
- Published: September 30, 2025
- Key Insight: ~50% expert parity achieved
MarkTechPost: OpenAI Introduces GDPval
- URL: https://www.marktechpost.com/2025/09/25/openai-introduces-gdpval-a-new-evaluation-suite-that-measures-ai-on-real-world-economically-valuable-tasks/
- Published: September 25, 2025
- Key Insight: Evaluation suite methodology overview

Anthropic Transparency Hub: Model Report
- URL: https://www.anthropic.com/transparency/model-report
- Type: Model capabilities and benchmarks
Claude Opus 4.5 Announcement
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: GDPval comparative performance

Gemini 3 Announcement
- URL: https://blog.google/products/gemini/gemini-3/
- Published: November 2025
- Type: Model capabilities overview
Gemini 3 Technical Report
- URL: https://deepmind.google/models/gemini/
- Type: Benchmark results

Microsoft Work Trend Index: AI Revolution Insights
- URL: https://www.microsoft.com/en-us/industry/microsoft-in-business/future-of-work/2025/04/25/leading-the-ai-revolution-insights-from-microsofts-work-trend-index/
- Published: April 25, 2025
- Key Data: Workplace AI productivity impacts
Microsoft: Unlocking AI’s Global Potential
- URL: https://blogs.microsoft.com/on-the-issues/2025/04/10/unlocking-ai-global-potential/
- Published: April 10, 2025
- Key Data: Global productivity projections

Microsoft Research: Predicting and Explaining AI Model Performance
- URL: https://www.microsoft.com/en-us/research/blog/predicting-and-explaining-ai-model-performance-a-new-approach-to-evaluation/
- Type: Evaluation methodology research
- Key Contribution: ADeLe framework for AI assessment

VentureBeat: Claude Opus 4.5 Performance Analysis
- URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
- Published: November 2025
- Key Insight: Model comparison on professional tasks
TechCrunch: Gemini 3 Benchmark Scores
- URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
- Published: November 18, 2025
- Key Data: Cross-benchmark performance