While out in Mountain View, I got a few minutes to sit down with Nathaniel Fairfield, principal software engineer at Waymo. While we couldn’t touch on the truly juicy stuff — the ongoing lawsuit between Waymo and Uber, and Waymo’s recent partnership with Uber competitor Lyft — it was a fascinating look at a field of self-driving cars, a discipline that’s come incredibly far in just a little over a decade. (Remember that it was just 12 years ago that a self-driving car was even able to complete the DARPA Grand Challenge course.)
So what do you do at Waymo?
[Laughs.] Good question. I’m a principal software engineer, and I started all the way back in 2009, and I’ve worked on a lot of different things over the course of time — traffic-light detection, and some more experimental projects.
How long have you been with this actual project? I know Waymo’s been through more name changes than Prince, but were you always with Waymo?
No, I’ve been with this project — I came here to do this project, and I’ve been with this project the whole time.
Were you involved in the DARPA challenges in the mid-aughts?
I wasn’t. I was at Carnegie Mellon, so I was adjacent to those folks, but I was working on underwater robots and other things like that, but then washed up here because, ultimately, what I want to do is robots fending for themselves. Underwater is like that, too, because the robots can’t communicate, so they’re off by themselves. This was an opportunity to do this for real, across the board, and really have a big, positive impact on people’s lives.
What I do now, I’m onboard software — there’s a lot of stuff off-board, but onboard there are kind of two big pieces. One is the perception piece where you’re figuring out a model of the world around you. So you’re using the lasers, radars, and cameras to detect the traffic lights, pedestrians, and cars, and all of that stuff. And then there’s the map that we’ve built. Combine all together, you get a model of what is going on.
And then you still need to make decisions — so that involves the route you’re gonna take, the actual tactical trajectory you’re gonna take: Am I gonna slow down, turn on my turn blinkers, or maybe even honk at somebody? And then control, which is, How do I actually move the steering wheel?
And all of this is taking place in the car itself — there’s almost no data connection required for all this tech to work, correct?
Yeah. The main reason it’s really simple is that many decisions have to happen really fast — you can’t relay communications off-board; your cell phone could go into a dead spot or something like that. You need to be able to be smart enough onboard the car to make all the really important decisions.
Was it always the case from the beginning that the majority of this tech had to be built right into the car?
All the critical safety stuff had to be onboard. But just to be clear: An awful lot does happen off-board, because we can get all the data from cars, and we can use that to teach them all kinds of good stuff. If you’re trying to figure, Hey, what’s the right braking profile when coming to a stop sign? What’s natural and communicates to other drivers?
How do I not slam on the brakes?
Right, How do I not slam on the brake? Because a stop sign is different. If you’re stopping for a traffic light, chances are you’re stopping for a while. For a stop sign, you actually use a different braking profile. You wanna get up to the sucker, get your turn, and go. So if you’re trying to figure stuff like that, you might take a bunch of data that you’ve collected and figure out from that data what the right parameters are to use. So the car isn’t in effect figuring that in the moment, but you’re using all that accumulated experience.
What are some of the edge cases that have really surprised you, where “I would never have expected to throw our perception model off”?
Oh, that’s an interesting question. We drive a lot. So we see a lot of the interesting things where we as humans, at first blush — like, I’m a little confused by these things. Like, you’ll see a series of cones laid down, and you think, Are you supposed to go on this side of the cones? Or that side of the cones? People get this wrong all the time. Sometimes, the world is just fundamentally a little ambiguous. And what you need to do is not try to get it perfectly right 100 percent of the time, because that’s actually not really feasible. What you have to do is accommodate other people going down the wrong way, and you yourself have to recover fairly well when you realize, Oh, wait, I’m supposed to be on that side.
So when you talk about recovering fairly well — what does that actually look like as a user? Does that mean I take control of the car?
No, no. In this case, it means the car itself in cases of uncertainty. We talk a lot about this, in terms of “falling back gracefully.” If something is confusing and then you do something completely arbitrary or unpredictable, that’s really bad. But in these cases, you can say, Wait a minute. There’s a line of cones here; I thought I was supposed to be on this side of it, but actually, I need to be on the other side. It’s not really that complicated.
We have this whole built-in semantics of what we see on the road. You and I see a bicyclist, and we understand it’ll behave differently than a pedestrian, and they’re gonna act differently than a car. An older person might act differently crossing the street than a younger person crossing the street. How do you train a model for that?
I think the real answer is that you don’t have to split things down into superfine categories. You mentioned bicyclists, pedestrians, automobiles — I would add motorcycles and large trucks. Those are some very major, very clearly different categories where you expect those agents to behave substantially different. If you see someone who’s elderly, you and I might try to figure out how their behavior is different, but really, in terms of your reaction as a driver to them, it shouldn’t change.
So then the answer to how do you model how a cyclist or a pedestrian behaves? You look at a lot of data and figure out how they work, and then start to make probabilistic models. That’s the perception side, to figure out probabilistic models for what they’re gonna do. They’re looking like they might cross the street; okay, then it seems likely they’re gonna cross the street. They’re not looking at the street — they’re looking in another direction altogether; they’re probably not gonna cross the street. But it’s all probabilities there.
From the planning perspective, what you do is be a little cautious here. Let me just slow down a little. Let’s get a little closer. Let’s see if everyone is really gonna cross the street or not, instead of blasting past at full speed. The closer we get to that intersection, the better idea we have of what’s going on, because their intentions become clearer, and your intentions become a lot clearer.
So to take it back, the hardware you can fit in a car is a lot more complex than what I can put in my phone — you’re talking about modeling the body stance of someone at a crosswalk. That’s a relatively complicated problem to solve for.
Well, just to be clear: We’re not trying to be omniscient here. The point is, neither are humans. I might not realize that someone is standing on the corner of this street, and the state of their thinking, and what exactly their plan is. But we have evolved a series of signals and indications that — like, when someone is standing right on the curb, then they probably mean to cross. But if you’re standing back a little bit, you’re probably not. You see human drivers make this judgment back and forth every day. I’m certainly very conscious of it when I’m driving now.
It’s not unique to the car, I guess is what I’m saying. There are no particular magical subtleties that you have to pick up — it’s the same signals that human drivers use, that human pedestrians use, to communicate back and forth. We need to make sure we react conservatively — and the nice thing about the car is, it doesn’t miss anything.
It’s almost that autonomic sense of like, I don’t even really think about it in my forebrain — it’s just like, of course when I see someone in the middle of the crosswalk, of course I’m going to slow down.
I mean, we have a whole theory developed about this — you are intuitively, and maybe subconsciously, pulling out some of the elements of uncertainties. You’re not sure what the situation is until you take actions. We can actually reason about those, and we’re designing a system in that same way — we’re more explicit about that uncertainty, because we’re sitting there tracking exactly to the centimeter where they are and what direction they’re going. But it does mean — pedestrian over here, pedestrian over there — the car is capable of things a human with a single head isn’t really able to do.
So you’ve been working on this for nearly a decade. Was there a problem that you had real difficulty in solving that now just seems almost easy.
Oh, yeah, so when we first started, there were these ten challenging routers, about 100 miles each, and you had to drive them completely — start to finish — without touching the wheel. And we didn’t really know what was possible. So there had been some interesting research in self-driving cars and the DARPA Grand Challenges, but no one knew if you could really push it all the way there. We dove into those problems. We drove all five bridges of the Bay Area and then down through Tiburon.
There was one other that went from San Jose to San Francisco on El Camino.
Less of a nice drive.
Yeah, and it took like four hours. But in the course of doing these days, we learned what was easy, what was harder, and what’s challenging and what’s not. In those days, it was much more of a simpler system: “Hey, there’s something kinda coming down toward me, I’m gonna slow down and brake.” There wasn’t a lot of nuance or depth to it.
So one of the things that was really challenging in those days was, you’d come up to an unprotected interaction — in other words, there was cross traffic coming. Like, as a person, if you can see something, obviously you don’t go. But if you can’t see something, what does that mean? We didn’t really have this model of occlusion. What we would do is say, “Okay, we don’t see anything, so we’re gonna go.” And if there was another car coming the other way, then the human driver would take control and prevent something bad from happening.
There were two problems there: The human was effectively able to double-check, and they were correcting for the fact that we didn’t know what we didn’t know. So if there was a big bush right at the intersection and you couldn’t see, wouldn’t it be nice if the car would just look a little further. That’s what humans do.
A sneak peek.
Or if there was a truck next to you, maybe you should wait until the truck gets out of your way before you decide to cross the intersection. So now we have this whole model of occlusions, and can use that to steer our sensors to get the most performance out of that.
So that was a case where the car could kind of do interactions, but it didn’t have the depth; it didn’t have an understanding of its own limitations. Once you have that, you’re not done because you need to evolve a set of behaviors to work around those limitations. So nowadays, there’s still some really nasty blind interactions, but there are things you can do to make them robust and correct in a fundamental way.
It used to be a certain company’s motto, they don’t use it any more, but it was “move fast and break things.” Obviously, you can’t do that when building a self-driving car. How much did you have to err on the side of caution when working on this?
Let me try to pin that down a bit. The reason people are so excited to work here and work so hard is — I mean, you know the stats: 1.25 million people die every year worldwide in auto accidents. That’s one 737 falling out of the sky every hour, every day. Click. Click. Click. And 94 percent of those are due to human mistakes. That’s why we’re here; that’s why we want to do this. So to be cavalier about safety would subvert that whole mentality. So our culture from the get-go has been a culture of safety and the importance of that.
Now that said, I once worked on a project called Paranoid Planner, which only goes as fast as it can go and be completely certain nothing bad will happen. Do you know how fast it will go?
It can’t move. The only safe speed is stop. But realistically, you can’t drive completely inert. So there is some trade-off and some subtlety there, where you try to find a reasonable rate where they can drive — where they’re really being cautious and being safer than I might be when I’m driving myself around. That’s really the objective; that’s how we think about it.
To make sure that the process we have as a company and a group of people, as we’re doing, we’re exploring these spaces and figuring this stuff — the degree of validation we do to make sure there aren’t some weird corner cases going on in simulations, real-world courses — before we push something out into the world. All of those things contribute to an argument that was used, “Are we we being safe enough?” We do have a sense of needing to move fast. Those airplanes are falling out of the sky, and you’re thinking, We need to get this out the door.
So you guys recently rolled out some of your first real public testing; you’re putting people in production models. What have you learned? How has having these self-driving cars on all day altered these families’ behavior?
So the early rider program is really, really interesting because what we’re doing is exploring that space and seeing how people use them — how they like to use them, what time of day, what places they like to use them — all those generic patterns that really let us figure out what it’s gonna be like ahead of time. We still have safety drivers in the car, to be clear. But we’re starting to get a little bit of evidence as to what people are going to care about.
It’s really interesting to get these other people’s perspective — to get these fresh eyes on the whole thing. That’s really been what most of this stuff that we’ve been exploring, and the information that we’ve been working to gather, has been about.
I may just be going off the videos you guys put out, but it seems that these are very much not geared toward what other ride-hailing services are offering, which is more nightlife — going out at night, meeting up with friends. The Waymo videos are much more about soccer practice, taking the kids to school, and picking them back up from school. Is that intentional?
Oh, we’re definitely thinking about that. I mean, we’re thinking about all the other ones, too — it’s kind of obvious to go after the bar scene and all that. But that’s different than what we’re looking at doing. We have these cars that can drive themselves. Many of the models you could imagine — a service, a car shared among a family or a small group of people — whatever combination of those things might work. What really matters, ultimately, is how are people using them, and what do they like, and what possibilities are opened by self-driving cars, and how does that affect people’s choices? That’s what we’re really trying to figure out.
(Parts of this interview have been edited and condensed.)