2020 election polls weren’t as wrong as you think

An explanation of why you should not use polls to predict the future, plus other advice from an expert pollster.

A ballot drop box at Beacon Hill Library on Friday, Oct. 2, 2020, in Seattle, Wash. (Jovelle Tamayo for Crosscut)

I was really hoping this election would end without having to talk about why the polls were so wrong in the last election, but here we go again.

First I'll cover the fundamentals, including this truism: Polls capture a moment in time and can't really predict the future. Then we'll talk about why the 2020 polls weren't as wrong as many people think. There are features of polling that sound like excuses, but are fundamental truths.

Polling fundamentals

Polls can’t predict. Pollsters like to say we are taking a “snapshot in time.” A survey can tally the answers only to specific questions given at a specific time by people who agreed to take the survey. No poll can count the choices of respondents who don’t take the survey, change their mind after the interview, make a selection later or decide not to vote.

The electorate is variable. Not everyone votes. An election poll’s accuracy depends upon a statistical sample of the electorate, a population that does not exist until after they have voted. Pollsters have to estimate the makeup of electorate, which affects the potential accuracy of the survey. In a year like this one, when millions of new voters cast ballots, that is more difficult to accomplish.

The margin of error is real. It is a mathematical estimate of the accuracy of a model of an electorate constructed from interviews with 500 to 1,000 people. A typical margin of error of ±3% means that the poll results have be within 3% up or down from the “true” answer. There is always a 5% chance that the poll will be off more than the margin of error.

Response rates are sinking. Survey response rates are now at or below 5%, making it difficult to get a representative sample. Pollsters are using combinations of landlines, cellphones, text messages and online interviews to secure a representative sample of the electorate. This challenge is magnified in some categories of the population, like noncollege-educated males, Latinos, etc. Some groups are harder to reach than others, and because their numbers in the sample will be relatively small, those categories will have a larger margin of error than the whole sample.

All of these factors, plus the percentage of undecided voters, amplify the chance that any given poll will not be accurately reflected in the eventual election results. Polling is not a precise tool, which is why the public and the media, who want more certainty, are never satisfied with polling.

Another way the public can be confused about polling is the emergence of handicappers and aggregators who average the polls to calculate win/loss probabilities state by state. Their forecasts, which have more in common with oddsmaking than predicting, have become a common feature of pre-election coverage.

How this applies to 2020

President Donald Trump was said to have 10% chance of winning this year. That is not zero. People do hit inside straights, make half-court, buzzer-beater shots, sink holes in one. Trump did it in 2016 with a 30% chance of winning.

After missing in 2016, most pollsters adjusted their models to include more voters in categories thought to be determinative in the underestimation of Trump’s 2016 vote — noncollege-educated males were the prime suspects. They also polled more frequently, especially in swing states. There were at least 241 published polls in 10 swing states alone during the last week of the campaign, from 11 in Iowa to 40 in Florida.

So how did the polls do? We won’t know precisely until all the votes are counted and certified, but so far the verdict is mixed. It is true that the polls underestimated Trump’s vote, but if you had bet the polling favorite to win in each state, you would have won 96% of your bets. In the U.S. Senate races, you would have won 91%.

In addition to correctly anticipating the national popular vote winner, the polling averages had the correct winner in 48 of the 50 states (plus Washington, D.C.), missing only Florida and North Carolina.

The big story is that the polls underestimated Trump’s vote. Again. And it wasn’t just a poll or two that was off — almost all of them were. Again.

The national polling average underestimated Trump's  current vote tally by just 2.3%, about normal. More damning, Trump outperformed the polling estimates in 50 of the 51 states (including Washington, D.C.) by an average of 5.5%. The average Trump underestimate was 6.8% in states that he won and 4.1% in states Joe Biden won.

It should also be noted that the polls also underestimated Biden in 38 states. The overlap is due to undecided voters in the polls. There are no undecided voters in the final vote count.

This gets us to the “shy” Trump voter hypothesis — people who will not tell pollsters (or their friends) that they are voting for Trump. A great deal of thought and energy went into exploring that elusive being since the 2016 election. The general conclusion in the polling community had been that, like Sasquatch, the shy Trump voter probably is a myth. That conclusion will have to be reexamined.

Can we improve polling?

As noted, many pollsters adjusted their methods to give more weight to noncollege-educated males, assuming that they were the main source of undercounting Trump support. But the problem doesn’t look like it’s that easy to correct, for a couple of reasons.

First, not everyone in demographic categories think alike. There is no such thing as “the Latino vote,” for example. Compounding the problem, people in demographic categories may respond to polls differently. There is evidence to suggest that college-educated Trump voters are more reluctant to reveal their vote than their noncollege fellow Trump voters. This is because college-educated Trump voters are more likely to be in professional or social settings where Trump support is scorned. That same “social desirability effect” presumably would be at work among Latinos and Black males, both categories in which Trump got more votes than expected. If this hypothesis is correct, weighting demographic categories won’t correct the underrepresentation error.

Second, because a core attitude of the Trump base is distrust of media and institutions — organizations that sponsor surveys — Trump voters may be less likely to answer surveys. So even if you give extra statistical weight to the answers of some categories, you are still weighting the answers of people who agreed to be interviewed. A question for the polling industry is whether Trump voters really were more likely to refuse to take surveys, or more insidiously, to give false answers. That is hard to prove or disprove.

A key question for the future is whether reluctance to reveal one’s vote relates specifically to Trump. Is there something about him that elicits this behavior? Or is it more systemic? There is evidence to suggest the former.

The overall accuracy of 2016 and 2020 national polls were not significantly different from previous years. The 2018 midterm polls — when Trump was not on the ballot — were generally accurate. In the 32 Senate races this year, the polling average underestimated the GOP vote in 29 races, by an average of 3.2% — a similar pattern, but less pronounced than the 5.5% average Trump underestimate.  

The consistent underestimation of Trump support will occupy us pollsters for the next four years, as well as the question of whether this reality will be reflected in surveys of Trump’s base without Trump on the ticket.

About the Authors & Contributors