Coding and Research Capabilities: Sources
SWE-bench Sources
Primary Benchmark
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
- Type: Academic benchmark
- Key Data: Methodology and baseline results
- SWE-bench Verified Leaderboard
- Type: Ongoing evaluation
- Key Data: Current model rankings
- Claude Opus 4.5 Announcement
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: 80.9% SWE-bench, exceeds best human
- VentureBeat: Claude Opus 4.5 Analysis
- URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
- Published: November 2025
- Key Data: Human comparison details
- ByteIota: Claude Breaks SWE-bench Record
- URL: https://byteiota.com/claude-opus-4-5-breaks-80-swe-bench-first-ai-to-beat-humans/
- Key Data: First AI to beat humans
- InfoQ: Claude Opus 4.1 Release
- URL: https://www.infoq.com/news/2025/08/anthropic-claude-opus-4-1/
- Published: August 2025
- Key Data: 74.5% SWE-bench
Productivity Research
METR Study
- METR: AI Coding Assistant Impact on Developers
- Type: Research study (July 2025)
- Key Finding: Experienced developers 19% slower with AI
Microsoft Research
- New Employee Copilot Usage Study
- URL: https://www.microsoft.com/en-us/research/publication/new-employee-copilot-usage-insights-into-productivity-and-socialization/
- PDF: https://www.microsoft.com/en-us/research/wp-content/uploads/2025/04/New-employee-Copilot-usage.pdf
- Published: April 2025
- Sample: 125 Microsoft interns
- Copilot’s Earliest Users Research
- URL: https://www.microsoft.com/en-us/worklab/work-trend-index/copilots-earliest-users-teach-us-about-generative-ai-at-work
- Key Data: 70% more productive, 29% faster
Stanford/BetterUp Research
- Workslop Study
- Published: September 2025
- Key Finding: 40% encounter low-quality AI output
- Key Data: $186/employee/month productivity loss
- Index.dev: ChatGPT vs Claude for Coding
- URL: https://www.index.dev/blog/chatgpt-vs-claude-for-coding
- Key Data: Feature and capability comparison
- GoCodeo: Claude vs ChatGPT for Coding 2025
- URL: https://www.gocodeo.com/post/claude-vs-chatgpt-for-coding-which-ai-dev-assistant-performs-better-in-2025
- Key Data: Performance benchmarks
- Descope: Developer’s Guide to AI Coding Tools
- URL: https://www.descope.com/blog/post/claude-vs-chatgpt
- Key Data: Use case recommendations
- ClickUp: Claude vs ChatGPT for Coding
- URL: https://clickup.com/blog/claude-vs-chatgpt-for-coding/
- Key Data: Feature comparison
- Level Up Coding: Why Devs Switched to Claude
- URL: https://levelup.gitconnected.com/why-i-ditched-chatgpt-for-claude-ai-and-all-my-dev-friends-are-doing-it-too-e2a1cbeb138a
- Published: March 2025
- Key Data: Developer preferences
AI Research Capabilities
- Anthropic: How AI Is Transforming Work at Anthropic
- URL: https://www.anthropic.com/research/how-ai-is-transforming-work-at-anthropic
- Key Data: Internal automation patterns
- eWeek: Anthropic Economic Index - Coding Dominates
- URL: https://www.eweek.com/news/anthropic-economic-index-claude-ai-usage/
- Key Data: 36% of Claude usage is coding
- Inc: Claude’s Killer Use Case
- URL: https://www.inc.com/ben-sherry/anthropics-claude-ai-has-1-killer-use-case-according-to-new-data/91240506
- Key Data: Software development dominance
CS Education Research
- Anthropic Education Report: University Students
- URL: https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
- Published: April 2025
- Key Data: Student coding patterns
- Computing at School: AI in Education
- URL: https://www.computingatschool.org.uk/forum-news-blogs/2025/april/empowering-educators-insights-from-anthropic-s-report-on-claude-s-role-in-higher-education/
- Published: April 2025
Source Categories
| Category |
Count |
| SWE-bench/benchmarks |
6 |
| Productivity research |
4 |
| Developer tools |
5 |
| AI research capabilities |
3 |
| CS education |
2 |
| Total |
20 |