Posted by Jon Pilkington on January 22, 2015

You know the difference between a $30 steak and a $12 steak. Although you may not be a red meat connoisseur, your taste buds would (hopefully) tell you that the $30 T-bone is worth every dollar. Between the way it was cooked and the seasons used to enhance its natural flavor, you've got yourself quite the meal. 

What's the theme behind this long-winded metaphor? Data quality. How does one differentiate high-quality data from low-quality data? Can a data set's value even be measured? 

"Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context."

What is data quality? 
According to TechTarget, data quality "is a perception or an assessment of data's fitness to serve its purpose in a given context." Basically, it is a metric that judges information based on its accuracy, comprehensiveness, relevance, consistency and reliability. 

For example, suppose you are collecting data about global warming and you find one statistic that states the earth's average temperature will rise 0.05 degrees Fahrenheit every year. This estimate is deemed reliable because it was produced by a non-profit organization made up of environmental scientists. However, after collecting information from universities and other scientific communities, you discover that the majority of naturalists predict the earth's temperature will increase 0.02 degrees Fahrenheit annually. 

Therefore, the non-profit's data point is of low quality, because it doesn't correlate with the dozens of other estimates formulated by other scientific organizations. 

Will data quality improve in 2015?
Given the example cited above, it's easy to see why collecting quality data is such an important component of data analysis initiatives. You could have the best visualization tools at your disposal, but if the software is crunching numbers that aren't accurate or valid, then the insights you'll receive ultimately won't allow you to make educated decisions. 

"When a high number of different data sets scrutinizes the behaviors of one entity, assessments are much more credible."

Martin Doyle, a contributor to Smart Data Collective, noted that the Internet of Things (IoT) will help improve data quality for a number of reasons, the main one being that sensors can not only produce information describing one, but many states of being. Think of a device attached to a machine in a factory. This mechanism can deliver information to a data analysis platform that describes the machine's temperature, current and historical movements, position, job performance, etc. 

Doesn't this just mean connected devices simply produce more data? It's true that more doesn't necessarily translate to greater quality. However, when a high number of different data sets scrutinizes the behaviors of one entity (whether it be a machine, or a customer), assessments are much more credible, because different factors back up certain claims. 

To put it simply: If multiple sources are dedicated to analyzing a specific object or being, the data produced by those sources are more reliable. 

Product Tour