Academic Task Benchmarks: Sources
Primary Benchmark Sources
GPQA (Graduate-Level Science)
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark
- Type: Academic benchmark
- Key Data: PhD-level science questions methodology
- Gemini 2.5 Technical Report
- URL: https://storage.googleapis.com/deepmind-media/gemini/gemini_v2_5_report.pdf
- Published: March 2025
- Key Data: GPQA Diamond scores
- Gemini 3 Announcement
- URL: https://blog.google/products/gemini/gemini-3/
- Published: November 2025
- Key Data: 93.8% GPQA Diamond
AIME (Mathematics)
- MAA American Invitational Mathematics Examination
- URL: https://www.maa.org/math-competitions/aime
- Type: Official benchmark source
- Gemini 3 Technical Details
- URL: https://deepmind.google/models/gemini/
- Key Data: 95% raw, 100% with code execution
Humanity’s Last Exam
- Humanity’s Last Exam Benchmark
- Type: Crowdsourced frontier knowledge benchmark
- Key Data: Questions at edge of human knowledge
- Gemini 2.5 Thinking Updates
- URL: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
- Published: March 2025
- Key Data: 18.8% without tool use
Google DeepMind
- Gemini 3 Deep Dive
- URL: https://deepmind.google/models/gemini/pro/
- Key Data: Academic benchmark details
- TechCrunch: Gemini 3 Benchmark Analysis
- URL: https://techcrunch.com/2025/11/18/google-launches-gemini-3-with-new-coding-app-and-record-benchmark-scores/
- Published: November 18, 2025
Anthropic
- Claude Opus 4.5 Capabilities
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: Academic task performance
OpenAI
- GPT-5 Technical Report
- Type: Model documentation
- Key Data: Academic benchmark improvements
Educational Research
AI in Education
- Anthropic Education Report: University Students
- URL: https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
- Published: April 2025
- Key Data: Student usage patterns
- Anthropic Education Report: Educators
- URL: https://www.anthropic.com/news/anthropic-education-report-how-educators-use-claude
- Published: August 2025
- Key Data: Faculty perspectives
Assessment Research
- Computing at School: Empowering Educators
- URL: https://www.computingatschool.org.uk/forum-news-blogs/2025/april/empowering-educators-insights-from-anthropic-s-report-on-claude-s-role-in-higher-education/
- Published: April 2025
- EdTech Innovation Hub: Faculty Use of Claude
- URL: https://www.edtechinnovationhub.com/news/anthropic-analyzes-how-university-educators-use-claude-across-academic-tasks
Industry Analysis
- Data Studios: Claude 4 Performance Analysis
- URL: https://www.datastudios.org/post/claude-4-in-2025-performance-safety-benchmarks-ecosystem-news-and-real-world-impact-for-enterpr
- Key Data: Academic benchmark comparisons
- Skywork AI: Claude 4.5 Comparison
- URL: https://skywork.ai/blog/claude-4-5-vs-other-ai-models-2025-comparison/
- Key Data: Cross-model academic benchmarks
- Keepler: Google Gemini 3 Analysis
- URL: https://keepler.io/2025/11/27/google-gemini-3-a-new-paradigm-in-frontier-ai/
- Published: November 27, 2025
Academic Integrity Research
- Academic Integrity in the Age of AI
- Type: Academic literature
- Key Topics: Assessment redesign, detection challenges
- Higher Education AI Policy Frameworks
- Type: Policy research
- Key Topics: Institutional responses to AI capabilities
Source Categories
| Category |
Count |
| Primary benchmarks |
7 |
| Model reports |
4 |
| Educational research |
4 |
| Industry analysis |
3 |
| Academic integrity |
2 |
| Total |
20 |