AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

← AI Impact Research · AI Capabilities Research


Agentic Capabilities

Research Question

What autonomous, multi-step work can AI perform?

Hypothesis

AI is transitioning from tool (requiring human prompting at each step) to agent (capable of autonomous multi-step workflows), with significant implications for task delegation and workforce augmentation.


Key Findings

1. OSWorld (Computer Use)

Benchmark: Real desktop and web tasks on virtual machines (Anthropic)

Model Score Date
Claude Opus 4.5 61.4% Nov 2025
Other frontier models 30-50% 2025

Task Examples:

2. Autonomous Refinement

Anthropic Finding: Claude agents can autonomously refine their own outputs (Anthropic)

3. Multi-Step Task Completion

AI can now autonomously:

4. Current Limitations


Implications for Universities

Task Delegation

Teaching Agentic AI

Research Workflows



Explore This Research


← Previous: Coding & Research Next: Safety & Alignment →