Aschenbrenner's comprehensive analysis of near-term scaling dynamics, capability trajectories, and the strategic implications of rapid AI progress for labs and states.
Websites
Essays, blog posts, and online resources on AI safety and related ideas.
Browse this category in the interactive library →
Research on how agents can learn internal world models to plan complex behavior, relevant to understanding how AI systems develop representations of their environment.
Formal models of agents and decision theory with alignment-relevant curriculum, covering utility, planning, and the theoretical foundations of agent behavior.
The most widely used structured course for getting into alignment, with curated readings progressing from core concepts to open research problems.
Community-maintained FAQ covering AI safety questions at every level, from basics to technical details, with links to source material.
The primary venue for technical AI alignment discussion, where researchers post and debate new ideas, proposals, and critiques.
Weekly summaries of alignment research with commentary, the best way to stay current on the field's output without reading every paper.
Hyperlinked explainers on rationality, AI risk, and alignment concepts, designed for building understanding incrementally.
Essays on rationality, decision theory, and AI risk from the researcher who shaped the field's early arguments and threat models.
Research notes on specification gaming, side effects, and AI safety from a DeepMind safety researcher, including the widely-cited specification gaming examples list.
OpenAI's research blog covering capabilities and safety, including superalignment updates, red teaming results, and governance thinking.
The home of mechanistic interpretability research, publishing detailed analyses of how transformer models represent and process information internally.
Newsletter on ML safety covering robustness, monitoring, alignment, and systemic risk with links to recent papers and commentary.
Research and commentary on ML safety, forecasting, and robustness from a Berkeley professor working on practical safety problems.
The research institute focused on mathematical foundations of aligned AI, publishing on agent foundations, decision theory, and logical uncertainty.
Weekly newsletter by Anthropic's co-founder covering AI research, policy, and industry developments with consistent attention to safety implications.
Deeply researched essays on ML, scaling, AI art, and technology forecasting, known for rigorous analysis and independent thinking.
Essays on AI, alignment, and the philosophical implications of language models and generative systems.
Open-source ML research covering language model training, evaluation, and the safety considerations of making powerful models widely available.
DeepMind's safety team blog covering specification gaming, reward modeling, scalable oversight, and their technical safety research agenda.
DeepMind's main research site with publications on capabilities and safety, including Gemini evaluations, alignment research, and responsible scaling.
Karnofsky's essays on AI risk, longtermism, and cause prioritization, including the influential Most Important Century series on transformative AI.
Technical AI safety writing and alignment research notes.
Intensive research program for people entering AI safety, with project-based learning and mentorship from established researchers.
Empirical research on AI timelines, historical technology analogies, and quantitative estimates of AI progress and impact.
Pioneering interactive journal for ML interpretability and visualization, setting the standard for making neural network internals understandable.
Forum for effective altruism with substantial AI risk discussion, including cause prioritization, career advice, and policy analysis.
The original community blog on rationality and AI alignment, where many foundational safety arguments were first developed and debated.
Curated dataset of alignment and safety documents from papers, books, and blogs, useful for training and evaluating AI safety knowledge.