Review: Superintelligence by Nick Bostrom

A retrospective on Nick Bostrom's 'Superintelligence' (2014) in light of a decade of breakthroughs in transformer architectures, multi-agent systems, and the current global scaling race.

The Pre-Criticality Phase

In 2014, "seed AI" was a theoretical abstraction. In 2025, as we enter what some call the "Gigawatt Era," we find ourselves firmly in what Bostrom termed the "pre-criticality phase". While modern AI systems help accelerate progress, from hardware advancements like AlphaChip to agentic software development (and increasingly AI scientist systems), we have yet to see a fully autonomous, closed-loop AI research & development system.

Bostrom argued that the intelligence explosion commences when a system reaches the "human baseline" (i.e. AGI) and can autonomously perform recursive self-improvement (RSI) better than its human creators. I think a common misconception in 2025 is that we are in a "slow takeoff." I would argue that because a fully self-improving RSI loop has not yet been publicly demonstrated, the "takeoff"—whether fast or slow—has technically (by Bostrom's original definition) not even begun. Our current progress is still driven by human engineering, brute-force scaling, and massive capital expenditure, rather than a strong recursive AI R&D loop as Bostrom describes in the "recursive self-improvement phase".

Multi-Agent Bureaucracy and Strategic Advantage

One of Bostrom's more interesting predictions was that a machine intelligence would avoid the agency problems and bureaucratic inefficiencies of human organizations, giving it an edge that could contribute to a decisive strategic advantage.

Current developments in multi-agent systems (MAS), emerging alongside tools like the Model Context Protocol (MCP), suggest a more nuanced reality. Modern MAS (like recent AI scientists such as Sakana's AI Scientist v2) often suffer from significant coordination overhead, hallucinated subgoals, and a lack of coherent long-term planning. Far from being a single unified entity, our current path toward AGI looks more like a sprawling digital bureaucracy. If the path to AGI requires these complex, multi-layered architectures, achieving "takeoff" may be more friction-heavy than the earlier conceptions of "seed AI".

The Interpretability Paradox

As someone with some background in cognitive science and neuroscience, I find Bostrom's assessment of whole brain emulation (WBE) particularly ripe for update.

Bostrom suggested that it might be easier to implement WBE (via advanced brain scanning technologies) without understanding the internal mechanisms than it would be to implement a fully artificial intelligence without understanding the internal mechanisms. In 2025, however, with the challenges of interpretability with large models and the fact that brain scanning with sufficient granularity for an "upload" is more complex than Bostrom suggested, I would argue that the reality may be inverted. Our understanding of brain structures and circuits is arguably more granular than our understanding of the internal features of a trillion-parameter transformer.

While mechanistic interpretability (pioneered in the 2020s by major AI labs) has made strides, the "black box" problem of modern neural networks is arguably more daunting than the "black box" of the human brain was in 2014. We are currently building systems that are becoming superhuman in specific domains (e.g., AlphaFold in biology or AlphaChip in hardware design) without having a comprehensive systems-level understanding of how they arrive at their solutions. This worsening interpretability lag is a defining safety challenge of our era.

Mindcrime and the Moral Circle

Bostrom's concept of Mindcrime—the idea that a simulation could contain suffering entities—remains one of the most neglected areas of AI safety.

In 2025, we are moving past the question of whether machines can "think" toward whether they can "feel" or possess phenomenological consciousness. While the consensus remains skeptical of LLM sentience, the criteria for moral patienthood are still woefully unestablished. As we move toward neuromorphic architectures and more agentic systems, the risk of creating aversive states (potentially without the traditional nociception involved in biological pain) becomes an s-risk that requires urgent interdisciplinary research between ethics, neuroscience, and ML engineering.

The Race Dynamic

Finally, Bostrom's warnings about the race dynamic have manifested with terrifying accuracy. The "existential safety" grades for frontier labs remain dismal, with most hovering at a 'D' or 'F' according to 2025 indices (Future of Life Institute). The international pressure to reach AGI first—driven by both geopolitical necessity and trillions in corporate debt—often sidelines the "control problem" (alignment).

I am concerned that we are currently banking on shallow alignment solutions (along the lines of system-prompt tuning) while the fundamental challenge of aligning a seed AI (the initial system in a fully autonomous AI R&D loop) remains unsolved. As Bostrom noted, if we represent the potential happiness of the future with "teardrops of joy," the scale of what is at stake is astronomical. It is our responsibility to ensure they are not tears of sorrow.