Losing (IT) Control Without Risk: Best Practices for Governing Non-managed Data

One of the biggest benefits of self-service analytics lies in the ability to rapidly combine and analyze data from a variety of sources. This can sometimes introduce serious governance challenges, however, given that half of this data typically comes from sources that aren’t managed by IT. It’s a control thing. While most organizations have well-defined strategies for governing data that lives in managed systems like enterprise applications or data warehouses, analysts often need to pull data from non-managed sources as well, like CSV or text extracts from transactional systems, personal spreadsheets, third-party reports or semi-structured content. Without proper governance in place, this can create big headaches around version control, data breaches, reconciliation, auditing…the list goes on.

Bridging the gap between enterprise governance and business agility

If you’re a business analyst looking to get the most from your non-managed data while simultaneously trying to keep the peace with IT (and perhaps even your legal department), your best bet is to construct a content repository that allows you to securely store, manage and control access to all of your source content as well as any reusable models you have for data extraction and prep routines. With this foundation in place, the next step in easing any IT anxiety is to ensure the following key governance capabilities can be layered on top:

  • Data retention. At a minimum, you should have document version control for consistency. In order to meet any regulatory or business requirements, relevant source data and documents should also be archived.
  • Data masking. Internal employees are the most common cause of data breaches, and while data discovery tools are a great way for users to build and share information, many times the underlying data is not protected and includes personally identifiable data (like Social Security numbers), personal sensitive data (like medical procedures) or commercially sensitive data. There are a number of industry and government regulations where noncompliance in this area has an estimated cost of $5.5 million per breach and can even result in legal action. Hiding, or obfuscating original data with random characters or other data to allow it to still be usable for analytics, will ensure the underlying data is visible only to authorized users.
  • Data lineage. With a repository for the underlying source content, you should be able to provide complete data lineage and drill down into any source document. This capability will be critical when you need to audit or reconcile data.
  • Data curation. You’ll want to share frequently used data sources or automated data preparation routines with others based on their user roles. This will not only instill confidence around secure access, but also support consistency in the data used to make important business decisions.
  • Role-based access. If you segment prepared data sets based on user roles, you’ll ensure that the right subset of data is delivered to authorized users.

Self-service data preparation tools are rapidly being recognized as a necessary component of any data discovery or advanced analytics implementation. But at the enterprise level it’s critical for your solution to be able to bridge the gap between the ease-of-use and agility that business users demand and the scalability, automation and governance required by IT.


Want to see managed analytics in action? Check out our infographic!