Editorial: Fifty shades of data


Recently uncovered data shows 100 percent of Americans are more likely to believe a statement if it includes the word “data.”

That statement is totally false. Fake news.

However, whether or not data usage impacts credibility is something that remains to be seen. While statistical analysis and prediction have long been a source of legitimacy, they have recently been called into question.

Statistical blunders do occur, and the most recent one was quite important. Almost every single poll and election forecast suggested Hillary Clinton would be in her third week of office by now.

But Clinton ultimately lost the Electoral College by 74 votes, and President Donald Trump has not been shy to emphasize that the mainstream media was dead wrong about his chances. He has even attempted to discredit current pollsters on the basis of their previously incorrect election predictions.

On Monday, President Trump tweeted the following from his personal account:


Without delving into the moral implications of such an assertion, let’s take a look at the concept of polls and how they work in terms of an election forecast. There are many different types of polls, and they are taken at district, state and national levels. The polls are then aggregated so all of the information available is being used as a predictor.

But a good model will go beyond the aggregate polls and will look at factors such as sample size and historical accuracy to put more weight on polls that are more likely to minimize bias. And some forecasts take it a step further and factor in more sophisticated influences such as third-party effects, running mates, etc.

Putting all of these complex assessments together should look a little like this.

After compiling all of the data it could, FiveThirtyEight — which rose to fame after perfectly predicting the outcome of the 2008 election — gave Clinton a 71.8 percent chance of defeating Trump, and he won anyway. But does that mean the polls were wrong or fake?

Of course not. There is uncertainty built into any prediction, and it is one of the major principles of what a probability is. An expected outcome is almost never guaranteed.

That’s why FiveThirtyEight was completely transparent about its methodology. Anyone who had any qualms with the forecast could see exactly how and why each calculation occurred.

But is it conceivable for every reader to scan over the methodology and come to a reasonable conclusion? Not in the slightest. Who would have the time?

We are living in an impatient age of digestible information. Infographics are on the rise. Attention spans are shrinking. Information goes viral at an infectious rate on social media, and headlines are hardly ever nuanced.

The consequence of the clash between digestibility and transparency of data presentation is that consumers have trouble processing it without bias.

Imagine scrolling through Twitter and seeing a graphic amidst all of the anti-Trump protests that shows a 55 percent approval rating for Trump’s immigration ban. Would you believe it? It exists, and is from a reputable source that collected data from a sample of 2,060 registered voters. Despite the small proportion of the country this survey represents, a random sample of this size can accurately poll the country with a 2.4 percent margin of error according to Pew Research Center.

The fact is, it’s not in our nature to shed bias and look at things from a neutral standpoint. And the only thing harder than processing data without bias is presenting it without bias. Even top statisticians who have been studied and given the same initial information, might come to different conclusions based on the analysis they choose to conduct.

In this complex swirl of motives, data and truth, there is only one thing for us to do: Stay critical. Stay critical of what you observe, but more importantly stay critical of yourself and what biases you may have.

And don’t believe someone just because they have data.

Exhibit A.

Comment policy

Comments posted to The Brown and White website are reviewed by a moderator before being approved. Incendiary speech or harassing language, including comments targeted at individuals, may be deemed unacceptable and not published. Spam and other soliciting will also be declined.

The Brown and White also reserves the right to not publish entirely anonymous comments.

1 Comment

  1. Robert Davenport on

    I am glad to see that the Brown and White editor thinks like an Engineer rather than like a Mountain Hawk.

    The poll referred to in the editorial was really pretty close to reality. Clinton 48.2%, Trump 46.1%, Johnson flatlined. The “toss up states” (referred to by a different name in the poll) were the eventual states critical to the Trump victory. As every Lehigh student or alumnus/alumna should know, the Electoral College system makes it possible for relatively small vote amounts to effect relatively large changes in Electoral College vote totals. Despite a majority of those voting (assuming a 50/50 split for minor party candidates) not being pleased with the results, the system worked as the founders intended.

    Truth: sincerity in action, character, and utterance. Lie: to make an untrue statement with intent to deceive. A lie by definition is false. What can a statement meant to “influence” be called? I’ll call it an inflie. It seems as though many politicians and businessmen consider such a statement as not a lie, not false. Mr. Trump has used such statements to get to a “Deal”.

    Past Trump inflies got him elected. He will continue to use them to get the “Deals” that he hopes will create a legacy with esteem.

Leave A Reply