Safety and Alignment: Sources
Model Safety Documentation
Anthropic
- Anthropic Transparency Hub: Model Report
- URL: https://www.anthropic.com/transparency/model-report
- Key Data: Safety evaluations, prompt injection resistance
- Claude Opus 4.5 Safety Features
- URL: https://www.anthropic.com/news/claude-opus-4-5
- Published: November 2025
- Key Data: Industry-leading prompt injection resistance
- Constitutional AI Paper
- Type: Academic research
- Key Data: Anthropic’s safety training methodology
OpenAI
- GPT-5 System Card
- Type: Technical documentation
- Key Data: Safety evaluations and mitigations
- OpenAI Red Teaming Network
- URL: https://openai.com/index/red-teaming-network/
- Key Data: Adversarial testing methodology
Google DeepMind
- Gemini Safety Documentation
- URL: https://deepmind.google/models/gemini/
- Key Data: Safety features and testing
- Gemini 3 Safety Report
- Type: Technical documentation
- Key Data: Multi-modal safety evaluations
Safety Research
Prompt Injection
- Prompt Injection Attacks and Defenses
- Type: Academic literature
- Key Topics: Attack vectors, mitigation strategies
- LLM Security Research
- Type: Academic literature
- Key Topics: Adversarial attacks on language models
Hallucination
- TruthfulQA Benchmark
- Type: Academic benchmark
- Key Data: Factual accuracy evaluation methodology
- Hallucination in LLMs: Survey
- Type: Academic survey
- Key Topics: Types, causes, mitigation
Jailbreaking
- Jailbroken: How Does LLM Safety Training Fail?
- Type: Academic research
- Key Data: Jailbreak techniques and success rates
- HarmBench: A Standardized Evaluation Framework
- Type: Academic benchmark
- Key Data: Harmful output evaluation
Industry Coverage
Safety Analysis
- VentureBeat: Claude Safety Analysis
- URL: https://venturebeat.com/ai/anthropics-claude-opus-4-5-is-here-cheaper-ai-infinite-chats-and-coding
- Published: November 2025
- Key Data: Safety comparison
- Data Studios: Claude 4 Safety Benchmarks
- URL: https://www.datastudios.org/post/claude-4-in-2025-performance-safety-benchmarks-ecosystem-news-and-real-world-impact-for-enterpr
- Key Data: Safety benchmark analysis
- The Great AI Privacy Divide
- URL: https://medium.com/@michael_79773/the-great-ai-privacy-divide-one-year-later-two-worlds-apart-e74ce6187f1f
- Key Topics: Privacy and safety considerations
Policy and Governance
AI Safety Policy
- UN AI Safety Discussions
- Type: Policy documents
- Key Topics: International AI governance
- OECD AI Principles
- Type: Policy framework
- Key Topics: Responsible AI development
Academic Programs
- AI Safety Research Programs
- Type: Academic landscape
- Key Topics: University research initiatives
- AI Ethics Curriculum Resources
- Type: Educational materials
- Key Topics: Teaching AI safety
Source Categories
| Category |
Count |
| Model documentation |
7 |
| Safety research |
6 |
| Industry coverage |
3 |
| Policy/governance |
4 |
| Total |
20 |