Primary Sources
OpenAI GDPval
- GDPval: Measuring the Performance of Our Models on Real-World Tasks
- URL: https://openai.com/index/gdpval/
- Published: September 2025
- Type: Benchmark announcement and methodology
- GDPval Technical Paper
- URL: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf
- Type: Full methodology and results
- GDPval arXiv Paper
- URL: https://arxiv.org/abs/2510.04374
- Citation: arXiv:2510.04374
- Type: Academic publication
- OpenAI Evals Platform
- URL: https://evals.openai.com/
- Type: Public evaluation service for GDPval gold set
News Coverage
Major Analysis
- Axios: ChatGPT GDPval AI Study
- URL: https://www.axios.com/2025/09/25/chatgpt-gdp-val-ai-study
- Published: September 25, 2025
- Key Insight: AI catching up to human work
- Fortune: AI Models Are Getting Very Good at Professional Tasks
- URL: https://fortune.com/2025/09/30/ai-models-are-already-as-good-as-experts-at-half-of-tasks-a-new-openai-benchmark-gdpval-suggests/
- Published: September 30, 2025
- Key Insight: ~50% expert parity achieved
- MarkTechPost: OpenAI Introduces GDPval
- URL: https://www.marktechpost.com/2025/09/25/openai-introduces-gdpval-a-new-evaluation-suite-that-measures-ai-on-real-world-economically-valuable-tasks/
- Published: September 25, 2025
- Key Insight: Evaluation suite methodology overview
Anthropic
- Anthropic Transparency Hub: Model Report
- URL: https://www.anthropic.com/transparency/model-report
- Type: Model capabilities and benchmarks
- Claude Opus 4.5 Announcement
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: GDPval comparative performance
Google DeepMind
- Gemini 3 Announcement
- URL: https://blog.google/products/gemini/gemini-3/
- Published: November 2025
- Type: Model capabilities overview
- Gemini 3 Technical Report
- URL: https://deepmind.google/models/gemini/
- Type: Benchmark results
Economic Context
Productivity Research
- Microsoft Work Trend Index: AI Revolution Insights
- URL: https://www.microsoft.com/en-us/industry/microsoft-in-business/future-of-work/2025/04/25/leading-the-ai-revolution-insights-from-microsofts-work-trend-index/
- Published: April 25, 2025
- Key Data: Workplace AI productivity impacts
- Microsoft: Unlocking AI’s Global Potential
- URL: https://blogs.microsoft.com/on-the-issues/2025/04/10/unlocking-ai-global-potential/
- Published: April 10, 2025
- Key Data: Global productivity projections
Academic Research
Benchmark Methodology
- Microsoft Research: Predicting and Explaining AI Model Performance
- URL: https://www.microsoft.com/en-us/research/blog/predicting-and-explaining-ai-model-performance-a-new-approach-to-evaluation/
- Type: Evaluation methodology research
- Key Contribution: ADeLe framework for AI assessment
Industry Analysis
- VentureBeat: Claude Opus 4.5 Performance Analysis
- URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
- Published: November 2025
- Key Insight: Model comparison on professional tasks
- TechCrunch: Gemini 3 Benchmark Scores
- URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
- Published: November 18, 2025
- Key Data: Cross-benchmark performance
Source Categories
| Category |
Count |
| Primary benchmark data |
4 |
| News coverage |
3 |
| Model announcements |
4 |
| Economic research |
2 |
| Academic methodology |
1 |
| Industry analysis |
2 |
| Total |
16 |