With the rise of the self-service model, more nontechnical and business users than ever before have access to data assets within the enterprise; often it is these very users who know the most about the context of the data they are working with. Governance administered centrally and exclusively from IT cannot effectively incorporate this "tribal knowledge" into data policies and the governance framework. Some tools, particularly data catalogs, look to harness end-user knowledge to help contextualize data assets, effectively "crowdsourcing" knowledge about resources in the data lake and data repositories. When this knowledge is collected in a way that is continuously visible and accessible to IT, it can be used to shape governance policies that are more reflective of actual data usage patterns within the enterprise.
Data catalogs help harness "tribal knowledge" for governance
As the number of data consumers – data citizens – within the enterprise grows, the less feasible it becomes for IT to centrally and exclusively manage all aspects of data governance. With more data and more data users, data is being created, accessed, and manipulated in more ways. Duplicates are generated, similar data sets are used for different purposes, users frequently have multiple roles, and business terminology often differs across business units and departments. All of these factors can make it difficult to create a single, centrally managed governance framework controlled by IT. What is often missing is the context of data's use, which comes from end users. Those that work with the data on a daily basis, rather than IT, are typically the most knowledgeable about which data sets are most relevant, what they are used for, and who the corresponding subject-matter experts/owners are. In the centrally managed IT model, this "tribal knowledge" often goes ignored or uncollected.
Bridging this divide between business users and IT is the emerging domain of data catalogs, which initially arose to provide business users a self-service way to navigate and locate data assets in the enterprise data lake and other repositories. But their greatest modern capability may be to "crowdsource" the knowledge of business users and circulate it back to IT so that comprehensive, enterprise-wide governance policies can be implemented that reflect actual data usage patterns. These tools, notably, are interactive; rather than just allowing users to statically navigate data, they enable users to add business context to data, subsequently altering metadata. Standalone vendors, such as Collibra and Alation, and catalog capabilities embedded within data management platforms, such as those offered by Unifi, offer business users multiple ways to annotate data, rate data sets, identify owners/experts, share knowledge, and collaborate over data assets in a way that is visible to IT. These activities and end user-provided information can subsequently be used to collectively shape information governance rules and policy.
Data governance in the self-service era, then, must become an ongoing feedback loop between IT and the end users of data. While IT needs to retain a critical role in initially assigning policies such as role-based access controls, it also needs to iteratively harvest the knowledge that only end users can provide: the data's context in day-to-day business use. Governance solutions, such as data catalogs, that cater to business users and provide rich tool sets for tagging data, rating/certifying data sets, requesting permissions, and submitting "tickets" related to data quality provide this functionality by allowing business users to effectively annotate data assets with relevant context, enriching the associated metadata. As governance policies operate on metadata, enriched and continuously updated metadata means that more granular and appropriate policies can be implemented.
2018 Trends to Watch: Data Governance, IT0014-003349 (October 2017)
On the Radar: Alation harnesses crowdsourcing and machine learning to speed data access, IT0014-003097 (January 2016)
SWOT Assessment: Collibra Data Governance Center, Version 5.2, INT002-000042 (December 2017)
SWOT Assessment: Unifi Data Platform, v2.6, INT002-000057 (January 2018)
Paige Bartley, Senior Analyst, Data and Enterprise Intelligence