AI Impact Research

Comprehensive research on AI platform usage from frontier labs (2025)

View the Project on GitHub vishalsachdev/ai-impact

Back to Overview Data & Analysis →

Safety and Alignment: Sources

Model Safety Documentation

Anthropic

  1. Anthropic Transparency Hub: Model Report
    • URL: https://www.anthropic.com/transparency/model-report
    • Key Data: Safety evaluations, prompt injection resistance
  2. Claude Opus 4.5 Safety Features
    • URL: https://www.anthropic.com/news/claude-opus-4-5
    • Published: November 2025
    • Key Data: Industry-leading prompt injection resistance
  3. Constitutional AI Paper
    • Type: Academic research
    • Key Data: Anthropic’s safety training methodology

OpenAI

  1. GPT-5 System Card
    • Type: Technical documentation
    • Key Data: Safety evaluations and mitigations
  2. OpenAI Red Teaming Network
    • URL: https://openai.com/index/red-teaming-network/
    • Key Data: Adversarial testing methodology

Google DeepMind

  1. Gemini Safety Documentation
    • URL: https://deepmind.google/models/gemini/
    • Key Data: Safety features and testing
  2. Gemini 3 Safety Report
    • Type: Technical documentation
    • Key Data: Multi-modal safety evaluations

Safety Research

Prompt Injection

  1. Prompt Injection Attacks and Defenses
    • Type: Academic literature
    • Key Topics: Attack vectors, mitigation strategies
  2. LLM Security Research
    • Type: Academic literature
    • Key Topics: Adversarial attacks on language models

Hallucination

  1. TruthfulQA Benchmark
    • Type: Academic benchmark
    • Key Data: Factual accuracy evaluation methodology
  2. Hallucination in LLMs: Survey
    • Type: Academic survey
    • Key Topics: Types, causes, mitigation

Jailbreaking

  1. Jailbroken: How Does LLM Safety Training Fail?
    • Type: Academic research
    • Key Data: Jailbreak techniques and success rates
  2. HarmBench: A Standardized Evaluation Framework
    • Type: Academic benchmark
    • Key Data: Harmful output evaluation

Industry Coverage

Safety Analysis

  1. VentureBeat: Claude Safety Analysis
    • URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
    • Published: November 2025
    • Key Data: Safety comparison
  2. Data Studios: Claude 4 Safety Benchmarks
    • URL: https://www.datastudios.org/post/claude-4-in-2025-performance-safety-benchmarks-ecosystem-news-and-real-world-impact-for-enterpr
    • Key Data: Safety benchmark analysis
  3. The Great AI Privacy Divide
    • URL: https://medium.com/@michael_79773/the-great-ai-privacy-divide-one-year-later-two-worlds-apart-e74ce6187f1f
    • Key Topics: Privacy and safety considerations

Policy and Governance

AI Safety Policy

  1. UN AI Safety Discussions
    • Type: Policy documents
    • Key Topics: International AI governance
  2. OECD AI Principles
    • Type: Policy framework
    • Key Topics: Responsible AI development

Academic Programs

  1. AI Safety Research Programs
    • Type: Academic landscape
    • Key Topics: University research initiatives
  2. AI Ethics Curriculum Resources
    • Type: Educational materials
    • Key Topics: Teaching AI safety

Source Categories

Category Count
Model documentation 7
Safety research 6
Industry coverage 3
Policy/governance 4
Total 20