AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

← AI Impact Research · AI Capabilities Research


Coding and Research Capabilities

Research Question

What can AI do in software development and research contexts?

Hypothesis

AI coding capabilities have reached a threshold where they exceed human expert performance on standard software engineering tasks, fundamentally changing CS education and research methodology.


Key Findings

1. SWE-bench Verified (Software Engineering)

Benchmark: Real-world GitHub issues requiring multi-file fixes (Anthropic)

Model Score Date Significance
Claude Opus 4.5 80.9% Nov 2025 First to exceed best human
GPT-5.1-Codex-Max 77.9% Nov 2025 Strong competitor
Claude Sonnet 4.5 77.2% Nov 2025 Cost-effective option
Gemini 3 Pro 76.2% Nov 2025 Google’s entry

2. Human Expert Comparison

3. AI Research Tasks

AI models can now perform:

4. Counterpoint: METR Study

Finding: Experienced developers 19% slower with AI assistance (METR)

Interpretation: Learning curve and context-switching may offset gains initially


Implications for Universities

CS Curriculum

Research Methodology

Assessment



Explore This Research


← Previous: Academic Benchmarks Next: Agentic Capabilities →