← AI Impact Research · AI Capabilities Research
Agentic Capabilities
Research Question
What autonomous, multi-step work can AI perform?
Hypothesis
AI is transitioning from tool (requiring human prompting at each step) to agent (capable of autonomous multi-step workflows), with significant implications for task delegation and workforce augmentation.
Key Findings
1. OSWorld (Computer Use)
Benchmark: Real desktop and web tasks on virtual machines (Anthropic)
| Model |
Score |
Date |
| Claude Opus 4.5 |
61.4% |
Nov 2025 |
| Other frontier models |
30-50% |
2025 |
Task Examples:
- Navigate web applications
- Fill out forms
- Manage files and folders
- Execute multi-step workflows
2. Autonomous Refinement
Anthropic Finding: Claude agents can autonomously refine their own outputs (Anthropic)
- Peak performance achieved in 4 iterations
- Other models couldn’t match quality after 10 iterations
- Self-correction without human intervention
3. Multi-Step Task Completion
AI can now autonomously:
- Research topics across multiple sources
- Write and edit documents iteratively
- Execute code and debug based on errors
- Manage project workflows
- Coordinate multi-tool operations
4. Current Limitations
- Reliability degrades with task length
- Novel situations cause failures
- Error recovery still imperfect
- Human oversight still necessary for high-stakes tasks
Implications for Universities
Task Delegation
- Administrative workflows increasingly automatable
- Research assistance at higher level of autonomy
- Student support services augmented
Teaching Agentic AI
- New curriculum area: AI agent design and oversight
- Ethics of delegation and accountability
- Human-agent collaboration skills
Research Workflows
- Literature review automation
- Data collection and processing agents
- Experiment monitoring and adjustment
Explore This Research