← AI Impact Research · AI Capabilities Research
Coding and Research Capabilities
Research Question
What can AI do in software development and research contexts?
Hypothesis
AI coding capabilities have reached a threshold where they exceed human expert performance on standard software engineering tasks, fundamentally changing CS education and research methodology.
Key Findings
1. SWE-bench Verified (Software Engineering)
Benchmark: Real-world GitHub issues requiring multi-file fixes (Anthropic)
| Model |
Score |
Date |
Significance |
| Claude Opus 4.5 |
80.9% |
Nov 2025 |
First to exceed best human |
| GPT-5.1-Codex-Max |
77.9% |
Nov 2025 |
Strong competitor |
| Claude Sonnet 4.5 |
77.2% |
Nov 2025 |
Cost-effective option |
| Gemini 3 Pro |
76.2% |
Nov 2025 |
Google’s entry |
2. Human Expert Comparison
- Claude Opus 4.5 exceeds best human candidate on Anthropic’s internal engineering test (Anthropic)
- With parallel test-time compute, matches best-ever human performance
- First documented case of AI exceeding top human software engineer on realistic tasks
3. AI Research Tasks
AI models can now perform:
- GPU kernel development
- Reinforcement learning algorithm implementation
- ML model training and tuning
- Research paper implementation
- Experiment design assistance
4. Counterpoint: METR Study
Finding: Experienced developers 19% slower with AI assistance (METR)
- 16 veteran open-source programmers
- 246 real-world tasks
- AI assistance reduced speed, not increased it
Interpretation: Learning curve and context-switching may offset gains initially
Implications for Universities
CS Curriculum
- Introductory programming courses need redesign
- Emphasis shifts from syntax to architecture and judgment
- AI collaboration as explicit skill
- Code review and AI output evaluation essential
Research Methodology
- AI-assisted literature review standard
- Experiment automation increasing
- Reproducibility aided by AI
- Attribution and integrity challenges
Assessment
- Code-writing exams largely obsolete
- Design and architecture emphasis
- Oral defense of code decisions
- Process documentation required
Explore This Research