A must-read: this intellectual tour-de-force by one of AI’s true pioneers not only explains the risks of ever more powerful artificial intelligence in a captivating and persuasive way, but also proposes a concrete and promising solution.
Will AI eventually supersede human intelligence at all tasks and, if so, will this be the best thing ever to happen to humanity, or the worst? There have been many thought-provoking books on this topic, including Nick Bostrom's "Superintelligence", but this is the first one written by a world-leading AI researcher. Stuart Russell is the first author of the standard textbook on the subject, "Artificial Intelligence: A Modern Approach". I can personally certify that he remains
at the top of his game, since I've had many opportunities to read his recent peer-reviewed technical AI papers as well as to personally experience his depth and expertise during numerous AI conversations and AI conferences over the years.
The book is loosely speaking organized into two parts: the problem and the solution, and I'll attempt to summarize them below.
1) THE PROBLEM: Russell argues that intelligence isn't something mysterious that can only exist in biological organisms, but instead involves information processing that can in principle be performed even better by future machines. He also argues that this is likely to happen, because curiosity and profit will continue to inexorably drive today's rapid pace of AI development until it eventually reaches the level of Artificial General Intelligence (AGI), defined as AI that can perform all intellectual tasks at least as well as humans. AGI could be great for humanity if used to amplify human intelligence to wisely solve pressing problems that stump us, and to create a world free from disease, poverty and misery, but things could also go terribly wrong. Russell argues that the real risk with AGI isn't malice, like in silly Hollywood movies, but competence: machines that succeed in accomplishing goals that aren't aligned with ours. When the autopilot of a German passenger jet flew into the Alps killing 150 people, the computer didn't do so because it was evil, but because the goal it had been given (to lower its altitude to 100 meters) was misaligned with the goals of the passengers, and nobody had thought of teaching it the goal to never fly into mountains. Russell argues that we can already get eerie premonitions of what's it's like to be up against misaligned intelligent entities from case studies of certain large corporations having goals that don't align with humanity's best interests.
The historical account Russell gives of these ideas provides a fascinating perspective, especially since
he personally knew most of the key players. He describes how early AI pioneers such as Alan Turing, John von Neumann, and Norbert Wiener were acutely aware of the value-alignment problem, and how subsequent generations of AI researchers tended to forget them once short-term applications and business opportunities appeared. Upton Sinclair once quipped "It is difficult to get a man to understand something when his salary depends upon his not understanding it", so it's hardly surprising that today's AI experts in industry are less likely to voice concerns than academics such as Turing, von Neuman, Wiener and Russell. Yet Stuart argues that we need to sound the alarm: if AI research succeeds in its original goal of building AGI, then whoever or whatever controls it may be able to take control of Earth much as Homo Sapiens seized control from other less intelligent mammals, so we better ensure that humanity fares better than the Neanderthals did.
2) THE SOLUTION: What I find truly unique about this book, besides Russell's insider perspective, is that he doesn't merely explain the problem, but also proposes a concrete and promising solution. And not a merely vague slogans such as "let's engage policymakers" or "lets ban X, Y and Z", but a clever nerdy technical solution that redefines the very foundation of machine learning. He explains his solution beautifully in the book, so below I'll merely attempt to give a rough sense of the key idea.
The "standard model" of AI is to give a machine learning system a goal, and then train it using lots of data to get as good as possible at accomplishing that goal. That's much of what my grad students and I do in my MIT research group, and that's what Facebook did when they trained an AI system to maximize the amount of time you spent on their site. Sometimes, you later realize that this goal wasn't exactly what you wanted; for example, Facebook switched off that use-time-maximizing system after the 2016 US and Brexit votes made clear to them that it had created massive online echo chambers that polarized society. But if such a value-misaligned AI is smarter than us and has copied itself all over the internet, it's not easy to switch it off, and it may actively try to thwart you switching it off because that would prevent it from achieving its goal.
Stuart's radical solution is to ditch the standard model altogether for sufficiently powerful AI-systems, training them to accomplish not a fixed goal they've been given, but instead to accomplish *your* goal, whatever that is. This builds on a technique known by the nerdy name "Inverse Reinforcement Learning" (IRL) that Stuart has pioneered, and completely transforms the AI's incentives: since it can't be sure that it's fully understood your goals, it will actively try to learn more about what you really want, and always be open to you redirecting it or even switching it off.
In summary, this book is a captivating page-turner on what is arguably the most important conversation of our time: the fate of humanity when faced with machines that can outsmart us. Thanks in large part to Russell, IRL is now a blossoming sub-field of AI research, and if this book motivates more people to deploy it in safety-critical systems, then it will undoubtedly increase the chance that our high-tech future will be a happy one.