just asking questions

What Keeps a Leading AI Scientist Up At Night

Photo-Illustration: Intelligencer; Photo: Getty Images

Stuart Russell isn’t just an AI expert; he literally wrote the book on it. Russell’s textbook Artificial Intelligence: A Modern Approach, which he co-authored with Peter Norvig in the mid-’90s, is still the gold standard for students in the field. So Russell’s signature on an open letter last month that warned about AI’s enormous potential pitfalls carried serious weight. Russell, a professor of computer science at UC Berkeley, isn’t a wild-eyed AI doomsayer; he believes the technology has the potential to transform the world for the better in astonishing ways. But he also worries that researchers have already lost a complete understanding of what their creations can do. I spoke with Russell about what the open letter accomplished, why a chatbot need not take on science-fiction qualities to wreak havoc, and whether the AI upsides are worth the existential risks.

The open letter you signed in March warned of many dangers humanity could face from artificial intelligence. It also made a very specific demand: a six-month pause on new AI development. That seemed more aspirational than anything. But have you seen the letter bear any fruit, in terms of people being a little more circumspect than they were about the direction this is all heading?
I guess two things have happened that might be relevant. One is that on April 5, OpenAI published an article, “Our Approach to AI Safety,” that says they think the technology should be subject to rigorous safety evaluations, and that regulation is needed to ensure that such practices are adopted. So it sounds like they’re agreeing with us, which is good. The other thing is the Chinese government issued their regulations, though I don’t suppose there’s any causal connection. But their regulations are extraordinarily strict, at least if you take them at face value, and I don’t think any of the large language models could meet them.

Of course, some of those restrictions probably involve things you can’t say about the Chinese government. 
Sure. I mean, the regulation that says the models may not output false information. You might say the Chinese government has a monopoly on false information, so they don’t want anyone to break that monopoly. But if it is taken literally, there’s simply no way the large language models can comply. There’s no way to force them to tell the truth, because they have no idea what truth means. They can generate plausible-sounding text, which means text that’s similar in some way to the kinds of texts that humans have generated in the past, and that’s it.

Well, that makes it sound like you’re a little more skeptical of the immediate dangers of this technology. A line a lot of the skeptics of the open letter have used is that dire AI scenarios are plausible-sounding, but not actually anything more than that, because the chatbots can’t be considered sentient. But you’re saying that —
No, nobody is saying that this is sentient.

But they’re saying that it could be eventually, no?
There’s no mention of sentience in the letter. I have said many times in writing and in interviews that sentience is irrelevant, if by sentience you mean consciousness or subjective experience or anything like that — the kind of thing that the Google engineer got fired for claiming that the LaMDA model was sentient.

Okay, that was the wrong word.
But the issue of whether a system that can’t really distinguish the truth from something plausible-sounding — can that have significant negative effects by itself? Yes. Partly because it doesn’t care about truth, it can manipulate people. And so I think we want to be clear, and I think the letter is clear, but it could have been clearer — about what are the negative effects of the systems that have already been released, and what are the possible effects of future systems?

There are systems, particularly those that can be connected to the physical world, like Gato, which is a DeepMind model that can have visual inputs. Currently, it’s connected to simulated worlds, but it can have physical outputs. And so that closes the loop with the real world in a very different way from text. But the other issue is that we don’t understand what these systems are doing at all. I mean, we literally do not know if they have a model of an external world, if they even have a conception of it, whatever conception means.

I’m trying to figure out what that means exactly — a model of an external world. It’s such a fuzzy concept, and it seems people have different definitions of it.
I think partly through introspection and a lot of centuries of philosophy, we have an idea of what that means for us. We have no idea what it’s like to only connect to the world through language. People sometimes talk about Helen Keller, who was deaf and blind and connected to the world through the palm of her hand, and that’s how she learned language. But she still had appropriate perceptions. She still had touch, so she still was aware that there’s a body and then that body can bump into things which are not her, and so on. So it’s really impossible to know. I think in principle, a system that connects only through language can form a conception of the outside world, but we just don’t know if it does.

And I could give you an example. GPT-4, I believe, has lots and lots of chess games in its training data, and you can play chess with it. Those chess games are represented as sequences of moves. And the moves are described in algebraic notations, so they look like: D4 to C6, knight takes B3, queen takes C1, et cetera. So using this idea, which is not exactly correct, but it’s a useful stand-in for what’s really going on inside — roughly speaking — if the game it’s played with you so far is similar to some game in its training data, then it’s able to basically retrieve the next move. And if it’s not exactly the same, but there’s three or four games that are somewhat similar, then maybe it’ll try to come up with some kind of average prediction for what the next move might be. And as long as they’re a game similar enough, it’s going to play like a grand master.

But it can go off the rails pretty easily.
If you get to a situation where there aren’t any games that are similar enough, or maybe there are two games that are similar and it tries to average two moves that comes up with a new move, that doesn’t make any sense at all. It moves a piece that doesn’t exist, or tries to capture something that isn’t on the board, and then you immediately realize that it can’t actually play chess at all, because it hasn’t learned that there is a board. It hasn’t learned that there are pieces on that board, that D6 actually is moving a piece.

And I’ve seen versions of it where it looks like they trained it not just with moves, but with a little eight-by-eight grid showing the pieces, with capital letters for the black pieces and lowercase letters for the white pieces and so on. And you would think that would help, but actually, it still makes completely bonkers illegal moves, and doesn’t update the board correctly when you make moves. So I don’t think it has properly figured out the connection between the A4 and D6 and that little grid that’s also part of the text. When you see that, it does make you wonder whether it’s got any conception in any other domain of an external world. But honestly, we don’t know. We just don’t know because we have no idea what kind of internal processing is going on.

From the external behavior, we can’t really introspect on its behalf and figure out what might be causing that external behavior.

Over the next few years, are you more concerned about the evolution of the kind of technology we’re already seeing with ChatGPT — large language models but vastly improved? Or are you more concerned about AI applications we haven’t comprehended yet? The unknown unknowns, to quote Donald Rumsfeld. 
It’s a good question. I think of it as — we’re trying to solve this jigsaw puzzle of how to create general-purpose intelligence systems. For most of the history of AI, we had a certain idea of what the pieces of that puzzle were. We had certain kinds of machine-learning algorithms, we had reinforcement learning, we had logical reasoning, we had probabilistic reasoning, we had positing of natural language input into grammatical structures. And we tried to fit all those pieces together, and there were obviously pieces missing and the pieces we had didn’t fit together very well. And along comes this new piece — the language model, which five years ago no one cared about.

Yes, people are bad at forecasting the future. They were predicting the rise of AI, but not really in the form of chatbots.
There were word predictors that were good at improving speech-recognition systems, because when you said an unclear word, they could guess what that word might be.

So it just wasn’t a thing, really. We didn’t realize that along that path there was this new piece of the puzzle. And we’re still trying to figure it out. We’ve got this new piece, but we can’t figure out what shape it is and what’s on the pattern on the piece, and how does it fit into the puzzle? It’s not the whole thing. For various reasons, I don’t think that just making these things bigger and bigger is going to produce real, general-purpose AI. I think probably the biggest reason is that as we currently build them, they can’t do a substantial amount of reasoning or planning.

So does that mean that if we stay on this current track, you would be less concerned about civilizational collapse? 
I think we could still have enormous civilization effects with the technology that already exists, or maybe with the next version, because the systems could be pursuing goals. I don’t know if you read the Guardian article I wrote, but —

Yeah, I did.
I asked Sebastien Bubeck how Microsoft AI developed their own internal goals, and if so, what are they? And he said, “We have no idea.”

That’s a little disquieting.
Right. And it’s actually entirely plausible that they should develop internal goals, because they’re trying to be human linguistic generators. They’re trying to be the same kind of thing that produced all that data, which is us. And we have goals, and those goals play a role in how we produce that data. And so, it’s entirely plausible that they should acquire goals in the process of becoming good predictors of the next word.

And we have no idea what those goals are. They could be nefarious.
When Kevin Roose was having that conversation with Sydney, I don’t think he did anything to prompt her to declare undying love for him. Something he said seems to have activated this goal of, “I want to marry you.” And Sydney goes on for pages and pages and pages trying to achieve that goal.

Maybe “AI has goals” is a more useful way of looking at this than the idea that “AI could become conscious,” which you swatted away. These systems just go onto the next thing, but there’s no value judgment on what that thing is. 
Yeah, the goal plays the usual causal role in generating behaviors. You start choosing behaviors that are likely to achieve the goal. It’s no different from a chess program that’s trying to checkmate the opponent. I’ve been looking at a lot of newspaper articles recently, because there have been loads and loads, and when you look in the comments sections, you always see people who say, “Well, there’s nothing to worry about because these systems don’t have any desires,” and that’s wrong. There’s this misunderstanding, that a desire is something only a conscious being could have.

The letter you signed paints a pretty dark picture of the future: AI possibly taking all our jobs, possibly flooding the world with propaganda, possibly ending civilization. With the downsides so vast in your view, is it even worth doing any of this in the first place? Are the upsides so great that the possibility of these dark scenarios, even if they’re slim possibilities, are worth the trade-off? 
I think the upsides could be very significant, and I’ve written about that at length. In very simple terms, if you have general-purpose AI, which can do anything that human beings can do, it can do what we already know how to do, which is to deliver what we think of as a nice, comfortable quality of life for hundreds of millions of people. But it can do that at essentially no cost, on an unlimited scale. Depending on how we make the political and economic decisions and our distributional decisions, we could arrange it so that everyone on Earth has a high quality of life. If you do the back-of-the-envelope calculation for the net present values or the sort of cash equivalent of that, it’s about 13 and a half quadrillion dollars.

Not the kind of money you find in the couch cushions. 
There’s a really nice article in the Financial Times, by the way, by a guy called Ian Hogarth. He mentioned the amount of money that was put into AGI start-ups in the first quarter of 2023: $23 billion, which I found pretty astounding.

But if we’re talking a 14 quadrillion payout, then maybe it’s low.
Yeah, exactly. So there is a bet there that GPT 5 or 6 will be that general-purpose AI, which can deliver all that value. And I’m not totally convinced of that. So maybe that’s why it’s still only billions and not trillions. because when you think about AlphaFold, which if you have to say, what’s the single biggest contribution from deep learning to humanity so far? It’s probably AlphaFold 2, right? Which correctly predicts the structures of all the proteins. I don’t think that could be done by the large language models at all.

The letter also drew quite a bit of flak from other experts in the field. Some thought it overhyped the threats posed by AI and focused too much on “long-termism.” One outspoken critic was Timnit Gebru, who wrote, along with Emily Bender and others, “We are dismayed to see the number of computing professionals who have signed this letter and the positive media coverage it has received. It is dangerous to distract ourselves with a fantasized AI enabled utopia or apocalypse, which promises either a flourishing or potentially catastrophic future.” She also wrote, “We should be focusing more on what AI is doing right now than this kind of distant future.” What do you say to that? 
I don’t know Timnit personally, but I did read that. I have to say, I find this line of argument distressing and puzzling. I think it’s entirely possible that there are two problems to solve and we should solve both of them. And I think this repeated claim that, “Well, if you’re talking about problem B, then you’re dismissing problem A,” I think that’s fallacious.

Take long-term storage of nuclear waste. That’s been a concern with nuclear power since the beginning. But if you talk about long-term storage of nuclear waste, does that mean you are dismissing concerns about meltdowns? If you talk about climate change, does that mean you are dismissing concerns about indoor pollution from wood-burning stoves? It just doesn’t make any sense. Because in fact, many of the short-term concerns overlap with the long-term concerns. They result from the same types of design failures, basically building systems that are pursuing an incorrectly defined objective.

This interview has been edited for length and clarity.

What Keeps a Leading AI Scientist Up At Night