AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

Back to Overview Sources →

Real-World Task Performance: Detailed Data

GDPval Benchmark Design

Task Construction Methodology

Source Professionals:

Sectors Covered:

  1. Professional, Scientific, and Technical Services
  2. Finance and Insurance
  3. Information Technology
  4. Healthcare and Social Assistance
  5. Educational Services
  6. Manufacturing
  7. Retail Trade
  8. Administrative Services
  9. Arts, Entertainment, and Recreation

Task Types:

Evaluation Methodology

Blinded Pairwise Comparison:

Scoring:


Model Performance Data

GDPval Gold Set Results (220 tasks)

Win + Tie Rates (Expert Parity or Better):

Model Win+Tie Rate Primary Strength
Claude Opus 4.1 ~50%+ Aesthetics, formatting
GPT-5 ~45-50% Accuracy, domain knowledge
GPT-5 Thinking ~40-45% Balanced
Gemini 3 Pro ~33-40% Varies by task
Grok ~20-33% Varies by task

Performance by Dimension:

Dimension Best Model Notes
Accuracy GPT-5 Domain-specific knowledge
Formatting Claude Opus 4.1 Document layout, slide design
Completeness Claude Opus 4.1 Covers all requirements
Prompt Following GPT-5 Instruction adherence

Performance Variation by Task Type

Tasks Where AI Performs Best:

Tasks Where AI Performs Worst:


Speed and Cost Analysis

Time Comparison

Task Type Human Expert AI Ratio
Report Writing 4-8 hours 2-5 minutes ~100x
Spreadsheet Analysis 2-4 hours 1-3 minutes ~80x
Presentation Creation 3-6 hours 3-8 minutes ~60x
Email Drafting 15-30 minutes 10-30 seconds ~60x

Cost Comparison

Factor Human Expert AI (API)
Hourly rate equivalent $50-200/hr $0.50-5/hr
Per-task cost (complex) $100-500 $1-5
Per-task cost (simple) $25-100 $0.10-0.50

Note: AI costs assume API pricing; consumer subscriptions have different economics.


Trajectory Analysis

Historical Performance on GDPval

Model Release Win+Tie Rate Delta from Previous
GPT-4o Spring 2024 ~25% Baseline
GPT-4o (updated) Fall 2024 ~30% +5%
GPT-5 Summer 2025 ~50% +20%

Implication: Performance more than doubled in 14 months.

Projected Trajectory

If current trend continues:

Caveats:


Occupation-Specific Findings

Occupations with Highest AI Performance

  1. Technical Writer: High accuracy, formatting strength
  2. Data Analyst: Spreadsheet and analysis excellence
  3. Marketing Coordinator: Content generation, campaign materials
  4. Administrative Assistant: Correspondence, scheduling, documentation
  5. Junior Software Developer: Code generation, debugging

Occupations with Lowest AI Performance

  1. Executive/Senior Manager: Strategic judgment required
  2. Sales Professional: Relationship and negotiation focus
  3. Healthcare Provider: Physical examination, patient interaction
  4. Legal Counsel: High-stakes judgment, liability concerns
  5. Creative Director: Vision and direction (vs. execution)

Implications by Sector

Professional Services

Finance and Insurance

Information Technology

Healthcare

Education


Data Gaps

  1. Non-US occupations: GDPval focused on US GDP sectors
  2. Non-English tasks: Limited multilingual evaluation
  3. Physical tasks: Not covered by current benchmarks
  4. Long-horizon projects: Tasks limited to single-session completion
  5. Team collaboration: Individual task focus only

Key Takeaways for Universities

  1. ~50% of professional tasks approaching AI parity in quality
  2. 100x cost/speed advantage makes AI economically compelling
  3. Quality improving rapidly (~2x in 14 months)
  4. Judgment and relationships remain human advantages
  5. Workforce preparation must include AI collaboration skills