With the latest Leap Day now behind us, how did you spend the extra 1,440 minutes you were granted? If you are a typical data analyst, you likely spent over 240 minutes that day working with data itself. That estimate is a recent finding in the newly published report from Blue Hill Research, “Quantifying the Case for Enhanced Data Preparation.” The report is a useful guide to help data analysts and technology decision makers alike explore solutions to ease the data preparation process.
In fact, they found that at least 120 minutes in a day is spent just performing data preparation activities. Instead of spending so much time collecting, cleaning, and consolidating your data, wouldn’t your time be better spent examining it, allowing you to make more informed decisions to drive your business forward?
If you answered yes, you’ll be happy to know there are tips and tricks to make this a reality. When it comes to data preparation, timing truly is everything – both in terms of understanding why it’s such a laborious process in the first place and how you can streamline your activities.
What’s Taking So Long?
Whether you’re a data analyst or typical business user, your data prep process likely involves correcting any errors (most often human and/or machine input), filling in nulls and incomplete data, and merging data from several disparate sources or data formats. Your series of steps will typically involve the following which all add up:
- Data analysis – The data is audited for errors and anomalies to be corrected. For large datasets, data preparation applications prove helpful in producing metadata and uncovering problems.
- Creating an Intuitive workflow – A workflow consisting of a sequence of data prep operations for addressing the data errors is then formulated.
- Validation – The correctness of the workflow is next evaluated against a representative sample of the dataset. This process may call for adjustments to the workflow as previously undetected errors are found.
- Transformation – Once convinced of the effectiveness of the workflow, transformation may now be carried out, and the actual data prep process takes place.
- Backflow of cleaned data – Finally, steps must also be taken for the clean data to replace the original dirty data sources.
There’s Got to Be an Easier Way
If you’re finding yourself banging your head on your keyboard more often than you’d like, take solace in knowing there are a number of ways to make your day-to-day easier. Try incorporating the following steps the next time you’re tasked with a data prep project:
Have an idea of what problem you’re trying to solve.
Try to work backwards from that point and only prepare and clean the data that you need to answer the question at hand. Standard data prep is estimated to take nearly 80% of data analysts’ time, so it’s vital to only focus your time on the work that provides value to you.
Remember to leave time to analyze your data.
Preparing the data is only part of the project. Without careful and thoughtful analysis, what’s the point in the project to begin with? While it’s great to have clean, organized data, it’s more important for that data to add value to you and your organization.
Never rekey data if there’s another option.
Not only is rekeying data one of the biggest sources of errors in a data set, but it’s also a waste of time – time you don’t have. When working with multiple data sources, always try to find a unique identifier to match up between the sources. At that point, you can perform a data join to combine the data into one table.
Last, and most importantly, use the best tool for the job.
At some point, it’s time to say when. Sometimes the existing tools at our disposal aren’t enough and need replacing. Don’t get stuck in a rut by failing to admit that. Take Excel for instance. Sure, it’s a powerful tool that is often used as the default when it comes to preparing data. However, instead of Googling for formulas every other minute, it may be time to seek out a solution that’s more efficient for you. In fact, such a solution may enable you to uncover data you never knew existed, much less had access to.
If you’re looking for a faster, more agile way to get at any data and incorporate it into any data tool, be sure to learn more about and sign up for a free trial of Datawatch’s Monarch. And then start thinking about how you’re going to spend the 240 minutes you just gained back in your day.