Forecasting elections is a high-stakes problem. Politicians and voters alike are often desperate to know the outcome of a close race, but providing them with incomplete or inaccurate predictions can be misleading. And election forecasting is already an innately challenging endeavor -- the modeling process is rife with uncertainty, incomplete information, and subjective choices, all of which must be deftly handled. Political pundits and researchers have implemented a number of successful approaches for forecasting election outcomes, with varying degrees of transparency and complexity. However, election forecasts can be difficult to interpret and may leave many questions unanswered after close races unfold.
In their new paper, the authors propose a data-driven mathematical model of the evolution of political opinions during U.S. elections. They found their model's parameters using aggregated polling data, which enabled them to track the percentages of Democratic and Republican voters over time and forecast the vote margins in each state. The authors emphasized simplicity and transparency in their approach and consider these traits to be particular strengths of their model. "Complicated models need to account for uncertainty in many parameters at once," Rempala said.
Because there are two major political parties in the U.S., the authors employed a modified version of an SIS model with two types of infections. "We used techniques from mathematical epidemiology because they gave us a means of framing relationships between states in a familiar, multidisciplinary way," Volkening said. While elections and disease dynamics are certainly different, the researchers treated Democratic and Republican voting inclinations as two possible kinds of "infections" that can spread between states. Undecided, independent, or minor-party voters all fit under the category of susceptible individuals. "Infection" was interpreted as adopting Democratic or Republican opinions, and "recovery" represented the turnover of committed voters to undecided ones.
To determine the values of their models' mathematical parameters, the authors used polling data on senatorial, gubernatorial, and presidential races from HuffPost Pollster for 2012 and 2016 and RealClearPolitics for 2018. They fit the model to the data for each individual race and simulated the evolution of opinions in the year leading up to each election by tracking the fractions of undecided, Democratic, and Republican voters in each state from January until Election Day. The researchers simulated their final forecasts as if they made them on the eve of Election Day, including all of the polling data but omitting the election results.
After establishing their model's capability to forecast outcomes on the eve of Election Day, the authors sought to determine how early the model could create accurate forecasts. Predictions that are made in the weeks and months before Election Day are particularly meaningful, but producing early forecasts is challenging because fewer polling data are available for model training. By employing polling data from the 2018 senatorial races, the team's model was able to produce stable forecasts from early August onward with the same success rate as FiveThirtyEight's final forecasts for those races.
###
Volkening, A., Linder, D.F., Porter, M.A., & Rempala, G.A. (2020). Forecasting elections using compartmental models of infection . SIAM Rev., 62 (4), 837-865.
SIAM Review