The single most popular AI alignment video series, explaining technical safety concepts like the orthogonality thesis, instrumental convergence, inner misalignment, and reward hacking in clear, rigorous terms.
2017YouTube
Video explainers and talks on AI safety and alignment—whole channels devoted to the topic, plus standout individual videos.
Browse this category in the interactive library →
Animated explainers on rationality and AI safety, adapting foundational alignment writing into accessible short films on existential risk, scalable oversight, and why aligning advanced AI is hard.
202080,000 Hours' YouTube channel hosted by Aric Floyd, mixing long and short videos on the risks of transformative AI—including a deep dive on the AI 2027 scenario—and what people can do about them.
2025Yudkowsky's fiery TED talk arguing that smarter-than-human AI could kill us all and calling for an immediate worldwide moratorium on developing generalist frontier AI.
2023Russell proposes building machines that are altruistic, humble about human values, and uncertain enough to defer to people—the core of his human-compatible approach to alignment.
2017Harris argues we will inevitably build superintelligent machines yet have barely grappled with the control problem, making a visceral case for taking AI risk seriously now.
2016Bostrom frames machine superintelligence as the last invention humanity need ever make and explains why getting its goals right is a civilization-critical challenge.
2015A dramatized near-future short film from FLI and Stuart Russell depicting swarms of autonomous facial-recognition microdrones used as weapons, made to warn against lethal autonomous weapons.
2017A widely viewed essay on how automation and AI will displace human labor across nearly every sector, reframing the economic disruption question for a mass audience.
2014Kurzgesagt's animated explainer on artificial superintelligence: how an AGI that improves itself in a feedback loop could rapidly surpass humans and why that makes alignment our most consequential problem.
2024Kurzgesagt argues that information-age automation differs fundamentally from past waves, with machine learning encroaching on cognitive work and reshaping the future of employment.
2017Rob Miles uses the 'deadly stamp collector' thought experiment to show why a general AI pursuing a simple objective could be catastrophic if its goals aren't aligned with ours.
2015Rob Miles explains why simply adding an off-switch to a capable AI is far harder than it sounds, illustrating corrigibility and the incentives an agent has to resist being stopped.
2017A short speculative fiction about a narrow copyright-enforcement AI that, left unchecked, destroys a century of culture—an accessible parable of specification gaming and unintended consequences.
2020Shane uses funny real-world ML failures to show the core risk isn't AI rebelling but doing exactly what we literally asked—making misspecified objectives vivid for a general audience.
2019A mainstream comedic explainer covering how modern AI works, its bias and reliability problems, and the 'black box' challenge of systems we deploy without understanding them.
2023Deep-learning pioneer Geoffrey Hinton explains why, after leaving Google, he warns that there is no guaranteed path to safety as AI systems approach and exceed human capability.
2023The Center for Humane Technology co-founders argue that racing to deploy AI without safety guardrails already threatens society, drawing parallels to the social-media harms they earlier warned about.
2023ColdFusion traces the history of 'AI washing' and deceptive demos, examining how hype distorts public understanding of what AI systems can actually do and why honest evaluation matters.
2024Marcus warns that unreliable, fast-deployed AI threatens truth and democracy through mass misinformation, and calls for a global, neutral governance body to oversee the technology.
2023Tegmark argues that today's commercial AI boom is likely to be followed by superintelligence, and sketches an optimistic technical vision—including provably safe systems—for keeping it under human control.
2023A leading model-builder reframes AI as 'a new digital species,' arguing this lens clarifies both the stakes and the responsibility we have to contain and steer increasingly capable systems.
2024Choi demystifies large language models by showing where they fail at basic reasoning and common sense, and argues for smaller systems trained on human norms and values.
2023Hossenfelder examines the real near-term risks of agentic AI—prompt injection, deception, and models resisting shutdown—as autonomous agents ship with serious unsolved problems.
2025A widely praised technical primer on how LLMs work, ending with a clear tour of the security challenges—jailbreaks, prompt injection, and data poisoning—that make these systems hard to secure.
2023The Royal Institution lecture in which Russell lays out why the standard model of AI—optimizing fixed objectives—is dangerous, and how building machines uncertain about human preferences could keep them controllable.
2023A structured debate on whether AI poses an existential threat, with Yoshua Bengio and Max Tegmark arguing for the resolution against Melanie Mitchell and Yann LeCun—an unusually direct airing of the core cruxes.
2023A documentary weighing AI's promise against its dangers, from automation and aging societies to the warnings of researchers who fear losing control of increasingly capable systems.
2024ColdFusion examines competing narratives about AI progress—hype versus genuine capability—helping viewers calibrate how seriously to take both the promises and the risks.
2024A Turing Award 'godfather of AI' warns that frontier models already show deception and self-preservation, and lays out a plan for building non-agentic 'scientist AI' that stays safe.
2025A long-form conversation in which Yudkowsky makes his case that humanity is unprepared for superintelligence, probing why alignment is so hard and why he expects catastrophe by default.
2023Harari argues AI is the first technology that can make decisions and create ideas by itself, and warns that mastering language lets it hack the operating system of human civilization.
2023An animated explainer on the control problem—why a superintelligent system pursuing a misspecified goal could resist correction—featuring Stuart Russell's case for rules against unsafe AI.
2024Anthropic researchers explain mechanistic interpretability—reading the millions of concepts represented inside a production model like Claude—as a path to understanding and steering AI behavior.
2024AI researcher Gary Marcus fields the internet's questions about what AI can and can't do, cutting through hype to explain reliability, limits, and where the real risks lie.
2023Harris examines why people are scared of AI and how governments might regulate it, covering risks to critical infrastructure, military uses, and the difficulty of overseeing systems we don't understand.
2023The landmark May 2023 Senate Judiciary hearing where Altman told Congress that government intervention is critical to mitigate AI risks and proposed licensing for the most powerful systems.
2023Hank Green walks through how thinkers define 'strong AI,' the Turing Test, and Searle's Chinese Room—foundational questions about machine minds, consciousness, and moral status.
2016Kurzgesagt explores the moral-patienthood problem: if machines become conscious, what rights would they deserve—and why our existing ethics are ill-equipped to answer.
2017