← AI Impact Research · AI Capabilities Research

Coding and Research Capabilities

Research Question

What can AI do in software development and research contexts?

Hypothesis

AI coding capabilities have reached a threshold where they exceed human expert performance on standard software engineering tasks, fundamentally changing CS education and research methodology.

Key Findings

1. SWE-bench Verified (Software Engineering)

Benchmark: Real-world GitHub issues requiring multi-file fixes (Anthropic)

Model	Score	Date	Significance
Claude Opus 4.5	80.9%	Nov 2025	First to exceed best human
GPT-5.1-Codex-Max	77.9%	Nov 2025	Strong competitor
Claude Sonnet 4.5	77.2%	Nov 2025	Cost-effective option
Gemini 3 Pro	76.2%	Nov 2025	Google’s entry

2. Human Expert Comparison

Claude Opus 4.5 exceeds best human candidate on Anthropic’s internal engineering test (Anthropic)
With parallel test-time compute, matches best-ever human performance
First documented case of AI exceeding top human software engineer on realistic tasks

3. AI Research Tasks

AI models can now perform:

GPU kernel development
Reinforcement learning algorithm implementation
ML model training and tuning
Research paper implementation
Experiment design assistance

4. Counterpoint: METR Study

Finding: Experienced developers 19% slower with AI assistance (METR)

16 veteran open-source programmers
246 real-world tasks
AI assistance reduced speed, not increased it

Interpretation: Learning curve and context-switching may offset gains initially

Implications for Universities

CS Curriculum

Introductory programming courses need redesign
Emphasis shifts from syntax to architecture and judgment
AI collaboration as explicit skill
Code review and AI output evaluation essential

Research Methodology

AI-assisted literature review standard
Experiment automation increasing
Reproducibility aided by AI
Attribution and integrity challenges

Assessment

Code-writing exams largely obsolete
Design and architecture emphasis
Oral defense of code decisions
Process documentation required

RQ01: Real-World Task Performance - Professional context
RQ05: Agentic Capabilities - Autonomous work
RQ07: Educational Implications - Recommendations

Explore This Research

Detailed Data & Analysis → — SWE-bench results, research capabilities, and studies
All Sources → — Primary and secondary sources with links

← Previous: Academic Benchmarks

Next: Agentic Capabilities →