Why a Troubled Polling Industry Whiffed on the California Recall

Why did pollsters underestimate these people? Photo: Genaro Molina/Los Angeles Times via Getty Imag

The political arm of the public-opinion industry famously had a tough election cycle in 2020. Nobody much noticed when polls of primaries went awry after the COVID pandemic struck; we were in unchartered territory when it came to understanding the likelihood to vote or even how to contact voters. But going into the general election, there was a consensus that Joe Biden was going to dispatch Donald Trump by a comfortable margin, which turned out to be wrong (at least in terms of the Electoral College). This polling error, in turn, enhanced Trump’s efforts to depict the entire political system as biased and “rigged,” which has helped the Big Lie spread far and wide in the Republican ranks.

Then came the California gubernatorial recall, and again the polls seemed to have missed what was going to happen by a sizable margin. The average of recall polls at RealClearPolitics predicted a 14.5 percent “no” margin of victory, while the actual margin for keeping Gavin Newsom is nearly twice that at the moment (27.6 percent).

So is an industry-wide problem that was exposed in 2020 now just getting worse?

I don’t think so.

It’s true that the 2020 error was alarmingly large and mostly in one direction, as a July 2021 report from the American Association for Public Opinion Research acknowledged:

* The 2020 polls featured polling error of an unusual magnitude: It was the highest in 40 years for the national popular vote and the highest in at least 20 years for state-level estimates of the vote in presidential, senatorial, and gubernatorial contests. Among polls conducted in the final two weeks, the average error on the margin in either direction was 4.5 points for national popular vote polls and 5.1 points for state-level presidential polls. 

* The polling error was much more likely to favor Biden over Trump. Among polls conducted in the last two weeks before the election, the average signed error on the vote margin was too favorable for Biden by 3.9 percentage points in the national polls and by 4.3 percentage points in statewide presidential polls.  

* The overstatement of the Democratic-Republican margin in polls was larger on average in senatorial and gubernatorial races compared to the presidential contest. For senatorial and gubernatorial races combined, polls on average were 6.0 percentage points too favorable for Democratic candidates relative to the certified vote margin.

So you had a pattern of sizable error reflecting a systemic overestimation of Democratic voting and a systemic underestimation of Republican voting. The report went on to rule out a late swing to the GOP as explaining the error, along with the big factor that was thought to justify a similar but smaller pattern in 2016: underestimation of white non-college-educated voters. Methodology didn’t seem to explain it either: “Regardless of whether respondents were sampled using random-digit dialing, voter-registration lists, or online recruiting, polling margins on average were too favorable to Democratic candidates.”

This methodological factor rebutted the “shy Trump voter” theory some conservatives have promoted, since there is no reason a respondent to an online poll or a robocaller should be “shy” about the social stigma of admitting their preferences.

So the bottom-line guess of the professionals about the 2020 polling error is that pervasive MAGA mistrust of “the system” accidentally made pollsters miscalculate Republican opinion even when they calculated the composition of the electorate accurately: “Self-identified Republicans who choose to respond to polls are more likely to support Democrats, and those who choose not to respond to polls are more likely to support Republicans.”

In any event, nothing about this theory is applicable to polling error in the California recall election, which underestimated Democratic voting in a highly polarized contest where party turnout mattered most.

So why would that have happened? There are two pretty clear possibilities that distinguish the California 2021 polling error from the national polling error of last year.

First, this kind of contest is a nightmare to poll: a special (i.e., special purpose) election at a weird time of year. The ballot sent to voters was equally weird with two interrelated questions, and the second one (the contingent replacement contest) contained 46 names. One early-August poll that got a lot of attention from SurveyUSA showed “yes” ahead in the recall but was later critiqued by SurveyUSA itself, which regretted its phrasing of the key question and its method for determining the likelihood to vote. What made the problem particularly pervasive was the habit among nearly all pollsters of defining “likelihood to vote” as a product of “enthusiasm.” As veteran California political strategist Gary South noted, an election in which mail ballots were sent to all 22 million registered voters made it especially easy to participate:

You get a ballot mailed to your house, you fill out “no,” you put it back in the envelope, and you put it in the mail — postage paid. This doesn’t require enthusiasm, folks.

Indeed, the ballots’ arrival in 22 million mailboxes, beginning about a month before Election Day, might have been the galvanizing event for Democratic voters who had been paying no attention to the recall earlier. And that leads to the second explanation of polling error in the California election: In contrast to 2020, there really was a late trend that changed the complexion of the race dramatically, in part because of a heavy financial advantage for the “no” campaign. And, in fact, polls did capture that trend without necessarily anticipating its omega point. The RCP polling averages showed a close race until August 29 and then a rapidly increasing “no” lead. The Berkeley IGS/LATimes poll showed a mere three-point lead for “no” in July, which ballooned to 21 points in early September.

So at least for observers who were paying attention and weren’t neurotically overoptimistic or overpessimistic, the outcome wasn’t a surprise, and the extent of Newsom’s victory wasn’t a big surprise. Polls toward the end of the contest didn’t mislead anybody, which was not the case in 2020.

In other words, the California recall election didn’t solve the mystery of the 2020 polling errors, but it didn’t really compound it, either. Much bigger tests are approaching, of course, in the upcoming off-year statewide elections in New Jersey and Virginia and then the 2022 midterms.

Why a Troubled Polling Industry Whiffed on California Recall