When Data Lies: 4 Ways Your Data Can Deceive You

on July 13th, 2016

Analyzing data can be a wonderfully useful tool to understand what’s happening with your organization or project, but if you don’t know how to examine the data it can easily turn from friend to foe. A sales graph displaying a steady upwards trajectory seems like great news, but sometimes upon closer inspection you will find that not all that glitters is gold. Here are 4 common mistakes to avoid when trying to identify trends in data. Don’t let your data make a monkey out of you!

#1 Too Small a Sample

The first thing you should look out for is whether the amount of data you’re looking at is really sufficient to base a valid statistical conclusion on. This is something that can be lost when data is visualized. If you sold 8 units of a certain product in January and February, and then sold 12 in March, the visualization of this data would appear to show an incredible growth in sales towards the end of Q1. But obviously, when the numbers are that small, it really doesn’t necessarily mean anything. So while fluctuations are definitely something to look out for, they’re meaningless if your sample size is too small.

#2 Correlation is not Causation

Let’s say Jack from Finance has received a raise in each of the last 5 years in December. Also, in each of these Decembers, your sales in Singapore were 10% higher than usual. Should you give Jack another raise next Christmas? While this example might seem ridiculous, the fact is that if x happened for a number of times and was immediately followed by y, our tendency is to assume there was a causal relationship between the two, i.e. “y happened because of x”. However, not every correlation does indeed imply causation: if you can’t see any logical connection between the two occurrences, don’t rush to the conclusion that they are indeed co-related.

#3 Not Examining Parallel Time Periods

Another common mistake stems from failing to examine the bigger picture in terms of time. For example, you notice that your lead generation efforts have dropped 25% between July and August, and immediately call an emergency meeting, fire your entire Marketing department, etc. However, if you would have looked at the previous year’s reports you would have seen the same phenomena occurred last summer. This could hint that the problem does not lie with some change that occurred between July to August, but rather some seasonal phenomena that is affecting your business.

#4 Ignoring External Data Sources

Finally — don’t expect the data you’re currently examining to hold all the answers. Instead, try to bring additional data sources into the picture. Maybe the sharp decline in your retail sales is related to weather conditions, rather than your salespeople average age? Often data that is not generated by your organization can hold the key to understandings processes happening within it. Look for external data sources that could be relevant and start bringing them into account in the course of your data analysis.

Can you think of other common mistakes to avoid when examining data? Tell us in the comments!