Websites

Essays, blog posts, and online resources on AI safety and related ideas.

Browse this category in the interactive library →

Situational AwarenessLeopold Aschenbrenner

Aschenbrenner's comprehensive analysis of near-term scaling dynamics, capability trajectories, and the strategic implications of rapid AI progress for labs and states.

World ModelsJürgen Schmidhuber

Research on how agents can learn internal world models to plan complex behavior, relevant to understanding how AI systems develop representations of their environment.

Agent ModelsAgent Models

Formal models of agents and decision theory with alignment-relevant curriculum, covering utility, planning, and the theoretical foundations of agent behavior.

AGI Safety FundamentalsAGI Safety Fundamentals

The most widely used structured course for getting into alignment, with curated readings progressing from core concepts to open research problems.

AI Safety Info (Stampy's FAQ)StampyAI

Community-maintained FAQ covering AI safety questions at every level, from basics to technical details, with links to source material.

Alignment ForumCenter for Applied Rationality

The primary venue for technical AI alignment discussion, where researchers post and debate new ideas, proposals, and critiques.

Alignment NewsletterRohin Shah

Weekly summaries of alignment research with commentary, the best way to stay current on the field's output without reading every paper.

ArbitalArbital

Hyperlinked explainers on rationality, AI risk, and alignment concepts, designed for building understanding incrementally.

Eliezer Yudkowsky's blogEliezer Yudkowsky

Essays on rationality, decision theory, and AI risk from the researcher who shaped the field's early arguments and threat models.

Victoria Krakovna's blogVictoria Krakovna

Research notes on specification gaming, side effects, and AI safety from a DeepMind safety researcher, including the widely-cited specification gaming examples list.

OpenAI ResearchOpenAI

OpenAI's research blog covering capabilities and safety, including superalignment updates, red teaming results, and governance thinking.

Transformer CircuitsAnthropic / community

The home of mechanistic interpretability research, publishing detailed analyses of how transformer models represent and process information internally.

ML Safety NewsletterML Safety

Newsletter on ML safety covering robustness, monitoring, alignment, and systemic risk with links to recent papers and commentary.

Jacob Steinhardt's blogJacob Steinhardt

Research and commentary on ML safety, forecasting, and robustness from a Berkeley professor working on practical safety problems.

MIRI (Machine Intelligence Research Institute)MIRI

The research institute focused on mathematical foundations of aligned AI, publishing on agent foundations, decision theory, and logical uncertainty.

Import AIJack Clark

Weekly newsletter by Anthropic's co-founder covering AI research, policy, and industry developments with consistent attention to safety implications.

Gwern Branwen's blogGwern Branwen

Deeply researched essays on ML, scaling, AI art, and technology forecasting, known for rigorous analysis and independent thinking.

generative.inkgenerative.ink

Essays on AI, alignment, and the philosophical implications of language models and generative systems.

EleutherAI BlogEleutherAI

Open-source ML research covering language model training, evaluation, and the safety considerations of making powerful models widely available.

DeepMind AI Safety ResearchDeepMind

DeepMind's safety team blog covering specification gaming, reward modeling, scalable oversight, and their technical safety research agenda.

DeepMindDeepMind

DeepMind's main research site with publications on capabilities and safety, including Gemini evaluations, alignment research, and responsible scaling.

Cold TakesHolden Karnofsky

Karnofsky's essays on AI risk, longtermism, and cause prioritization, including the influential Most Important Century series on transformative AI.

carado.moecarado

Technical AI safety writing and alignment research notes.

AI Safety CampAI Safety Camp

Intensive research program for people entering AI safety, with project-based learning and mentorship from established researchers.

AI ImpactsAI Impacts

Empirical research on AI timelines, historical technology analogies, and quantitative estimates of AI progress and impact.

DistillDistill

Pioneering interactive journal for ML interpretability and visualization, setting the standard for making neural network internals understandable.

EA ForumCentre for Effective Altruism

Forum for effective altruism with substantial AI risk discussion, including cause prioritization, career advice, and policy analysis.

LessWrongLessWrong

The original community blog on rationality and AI alignment, where many foundational safety arguments were first developed and debated.

StampyAI Alignment Research DatasetStampyAI

Curated dataset of alignment and safety documents from papers, books, and blogs, useful for training and evaluating AI safety knowledge.