Academic Task Benchmarks: Sources

Primary Benchmark Sources

GPQA: A Graduate-Level Google-Proof Q&A Benchmark
- Type: Academic benchmark
- Key Data: PhD-level science questions methodology
Gemini 2.5 Technical Report
- URL: https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf
- Published: March 2025
- Key Data: GPQA Diamond scores
Gemini 3 Announcement
- URL: https://blog.google/products/gemini/gemini-3/
- Published: November 2025
- Key Data: 93.8% GPQA Diamond

MAA American Invitational Mathematics Examination
- URL: https://www.maa.org/math-competitions/aime
- Type: Official benchmark source
Gemini 3 Technical Details
- URL: https://deepmind.google/models/gemini/
- Key Data: 95% raw, 100% with code execution

Humanity’s Last Exam Benchmark
- Type: Crowdsourced frontier knowledge benchmark
- Key Data: Questions at edge of human knowledge
Gemini 2.5 Thinking Updates
- URL: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
- Published: March 2025
- Key Data: 18.8% without tool use

Gemini 3 Deep Dive
- URL: https://deepmind.google/models/gemini/pro/
- Key Data: Academic benchmark details
TechCrunch: Gemini 3 Benchmark Analysis
- URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
- Published: November 18, 2025

Claude Opus 4.5 Capabilities
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: Academic task performance

GPT-5 Technical Report
- Type: Model documentation
- Key Data: Academic benchmark improvements

Anthropic Education Report: University Students
- URL: https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
- Published: April 2025
- Key Data: Student usage patterns
Anthropic Education Report: Educators
- URL: https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
- Published: August 2025
- Key Data: Faculty perspectives

Computing at School: Empowering Educators
- URL: https://www.computingatschool.org.uk/forum-news-blogs/2025/april/empowering-educators-insights-from-anthropic-s-report-on-claude-s-role-in-higher-education/
- Published: April 2025
EdTech Innovation Hub: Faculty Use of Claude
- URL: https://www.edtechinnovationhub.com/news/anthropic-analyzes-how-university-educators-use-claude-across-academic-tasks

Data Studios: Claude 4 Performance Analysis
- URL: https://www.datastudios.org/post/claude-4-in-2025-performance-safety-benchmarks-ecosystem-news-and-real-world-impact-for-enterpr
- Key Data: Academic benchmark comparisons
Skywork AI: Claude 4.5 Comparison
- URL: https://skywork.ai/blog/claude-4-5-vs-other-ai-models-2025-comparison/
- Key Data: Cross-model academic benchmarks
Keepler: Google Gemini 3 Analysis
- URL: https://keepler.io/2025/11/27/google-gemini-3-a-new-paradigm-in-frontier-ai/
- Published: November 27, 2025

Academic Integrity in the Age of AI
- Type: Academic literature
- Key Topics: Assessment redesign, detection challenges
Higher Education AI Policy Frameworks
- Type: Policy research
- Key Topics: Institutional responses to AI capabilities