Where the Pollsters Failed: The Internet Itself is a Bubble
Adding to the previous post about identity politics, there’s another angle to the 2016 elections as to why the Democrats lost this year: an over-reliance on data and over-estimation of how much “accuracy” technological products bring to us.
Having done a lot of projects related to data, I’m very familiar with the way data is typically used in both application and strategy. In my work, I’m obligated to exaggerate the significance and importance of data and metrics, but at the same time I do try to remind people that its conclusions must be contextualized very very very specifically in order for it to be of any predictive value. (I would like to add more “verys” to this list, if I could.)
The natural tendency for most is to try to universalize research findings by extrapolating the data outside of its intended contexts — it’s very tempting to do so because it gives you a sense of “insight” that makes you feel like you have the power to predict the future and navigate through the chaos of this big, bad world. If the problem itself isn’t framed correctly, however, you’ve essentially contaminated the data and have been telling yourself falsehoods the whole time.
I believe that this is essentially what happened with the 2016 elections — data scientists and analysts are doing some heavy soul-searching right now because the complete failure of their polling and predictive models was a serious blow to their credibility and perceived value, likely to have major repercussions in the industry for quite some time. The problem, however, wasn’t so much with the methods themselves but the lack of thoroughness in which the research itself was done.
The Internet Itself is a Bubble
The biggest and main oversight of the elections’ predictive models was an over-reliance on the internet as a source of data and gauge of political sentiment. If you’ve been getting your political news solely from online sources, you might have gotten the impression that Hillary was a sure win — or at least, had a pretty good chance of doing so.
But if you think about it carefully — who actually has the time to spend most of their day reading and interacting with others on the Internet? The wealthy, retired, children, and white-collar workers who spend most of their time on the computer as part of their job. Most blue-collar workers have labor intensive jobs that don’t let you browse the web when the boss isn’t looking — at least not to the extent that white-collar workers can.
Even phone-based polling is somewhat suspect because there is no incentive for people — especially those already suspicious of political organizations in general — to tell pollsters the truth. But these methods are still more accurate than online polling methods because the higher up in the tech tree you go, the less accurate the data itself becomes. A post from August by FiveThirtyEight has all the red flags of poor data methodologies being used: a heavily biased framework (how much is Clinton going to win by?), questions that guide the user towards specific outcomes, and not knowing for why these discrepancies existed.
But the truth is a much more simple one: the polling methods we use today aren’t made to include or account for the voice of the working-class, and that oversight has lead to a very big, costly mistake.
One of the tech industries’ main goals is the democratization of technology — the information, mobile phones, and the internet for everyone, everywhere. If you work in tech you’re likely to hear this matra being repeated everyday, all the time, by tech leaders everywhere. You may even come to believe that we’re already there yet and all we have to do now is crunch the numbers to arrive at the “truth”. But not quite — as last week’s elections show, we still have a lot of work left to do.
With data, context is everything. That’s something we should never forget.