Missing AI Safety Goalposts

January 6, 2025

I’ve been in existential/AI safety since I was a teenager in some way, shape or form. It seemed self-evident to me that either AI or nuclear weapons–both existentially risky technologies–would plausibly bring about our destruction. That would be very bad, therefore we ought to prevent it. I’ve spent my life trying to do just that.

The rough safety goalposts I’ve held for AI development:

Ensuring we would build a global community of smart, dedicated people tackling existential risks. This was partly done with effective altruism–which I helped with in its early days–but it has since failed in many ways to lead on this front.
Ensuring no one would develop existentially risky frontier AI without extremely stringent safeguards. Many of us assumed an Ex Machina, airgapped style of careful development by an individual or small group of innovators. That would have been extremely reckless, but not as much as what has actually happened. OpenAI being founded to explicitly build and “open source” artificial general intelligence (AGI) broke this. This was an unjustified risk and should not have been allowed to happen in a sane society. AI can create weapons of mass destruction or tools for societal control and there’s no known way to prevent all malevolent actors from doing so. Somewhat similarly, no one should open source nuclear or bioengineered weapons technology.
Ensuring we would globally shut down all existentially risky AI development upon the first empirical evidence of AI’s immense capabilities. ChatGPT’s launch demonstrated these growing capabilities, but there was no immediate pause. The Future of Life Institute’s 6-month pause letter was fantastic, but insufficient. In a sane society, all world leaders would have immediately endorsed it given the alternative would likely be an AI suicide race.
Ensuring we would globally shut down all existentially risky AI development upon the first empirical evidence of AI’s willingness and ability to deceive humans.
Ensuring no one would ever create AGI without extremely stringent internationally-set safeguards. This is an obvious extinction risk for nearly every serious thinker in the space, therefore many believed humanity should not even seriously contemplate working on AGI until we could ensure it would be provably safe. This seemed clearly true, if you thought in expected value terms and valued all life and not just your own.
Ensuring no one would ever create artificial superintelligence (ASI) without extremely stringent safeguards. This seems clearly to mean near immediate extinction (or at best simply disempowerment) of humanity, so it was hard to imagine people with common worldviews and high levels of psychological wellbeing would want this. Surprisingly to me, many today seem to embrace this view.

We’re likely to miss the fifth safety goalpost in a few months or years. We will not likely ever recover from missing that goalpost. It’s not easy to put humanity’s genie back in the lamp when eight billion people want their own personal genie.

If history is any indication, we will likely soon reach ASI and cause our own extinction. The painful irony is that this is entirely avoidable, if we were more individually and/or collectively rational.

It was always blindingly obvious we shouldn’t gamble with the Singularity without a full understanding of ethics, intelligence, and game theory, among other things. We needed a universally (or perhaps nearly universally) accepted solution to human flourishing, nonhuman flourishing, and international governance before we even seriously attempted to create ASI. We don’t have any of that, yet some among us choose to gamble anyway.