When will data preparation tools come to fruition?

At times, some people don’t appreciate the sweat that goes into blueprinting and launching a large-scale data analysis project. Data preparation┬áis a necessary step in these endeavors – a process my colleague Dan Potter has outlined before.

“This business world wants insights, and fast.”

If you read Dan’s post, you’ll appreciate why data preparation is a crucial component of information analysis and why we’re focused on constructing solutions that can automate the process.

Basically, the business world wants fast insights. In some cases, a CMO with a limited understanding as to what goes into a data analysis project may set high, even unrealistic expectations. “I need this on my desk by the end of the day,” is a phrase we’ve heard all too often.

Precious time
Forrester Research Principal Analyst Michele Goetz noted that many self-proclaimed “self-service” business intelligence tools usually fail to simplify what is a long and arduous process. She noted that data analysts, on average, spend 80 percent of their time developing reliable, complete data sets that can be easily scrutinized by analytics software.

Thankfully, data visualization developers, much like ourselves, are addressing this issue. Goetz maintained that resolving the data prep struggles analysts encounter comes with integrating front-end features that connect “subject matter experts intimately with their data.” That’s half the battle right there, but don’t forget the back-end functions.

“Making data preparation a more fluid, intuitive process is something we highly value.”

Goetz referred to Hadoop’s data governance policies, which are designed for mature security protocols. No, we shouldn’t skimp on protecting our information, but that doesn’t mean quality assurance shouldn’t be an integral component of our databases. Data scientists look for inconsistencies and irregularities within information when they want to analyze it, but why not bring this process into the collection phase? Of course, that’s when you start getting into machine learning, but that’s for another post.

What does the future look like?
As I’ve noted above, making data preparation a more fluid, intuitive process is something we, as well as other data analytics developers, are placing high on the pedestal. Information Age commented on findings from Gartner, which asserts that by 2017, most business professionals and data analysts will use self-service tools that are designed exclusively for data preparation. Keep in mind that this doesn’t mean data assessment skills will become inapplicable.

“Demand is high for smart data preparation tools.”

“Self-service data integration will do for traditional IT-centric data integration what data discovery platforms have done for traditional IT-centric BI: reduce the significant time and complexity users face in preparing their data for analysis and shift much of the activity from IT to the business user to better support governed data discovery,” said Gartner Research Vice President Rita Sallam, as quoted by the source. “However, specific skills are required. Self-service data integration requires that users master both the technical aspects and the business requirements of joining data together.”

In summation, demand is high for smart data preparation tools. With the amount and variety of information expected to grow, data analysts and scientists are going to require such solutions to operate at the pace business leaders are setting.