Data lakes are not typically the first use cases for big data implementations. Instead, they typically represent an advanced stage of evolution, once successful use cases materialize across multiple-line organizations and trigger the search for a strategy to manage and govern big data as an enterprise resource. But figuring out where to start with governance can be tough. Zaloni, which provides a solution for helping organizations manage the lifecycle of data, is packaging up a ready-made starter solution for the Hortonworks Data Platform (HDP) that can help enterprises take that first step.
Getting data lake governance off the ground
At the crux of the matter, data lake governance is about maintaining effective control and visibility over data. Data lake governance borrows from the experience of governing enterprise data warehouses – but the emphasis is on adaptation, because the nature of the data and how it is utilized differs sharply in a data lake. Ultimately, data lake governance encompasses:
Managing data inventory, which includes two centers of activity. These are data curation, which is performed by line-of-business users on a self-service basis to prepare their own sets of data; and physical inventory, where IT is ultimately accountable for documenting what data resides in the data lake and that it is properly secured.
Keeping data secured and accessed controlled.
Optimizing the cost and managing integration of the data lake with external data platforms and sources within the enterprise.
Zaloni has developed solutions that target curation, physical inventory, and security aspects of data lake governance. Its original solution, Bedrock, performs a lifecycle management function that manages data from ingest to discovery, preparation, cataloging, and securing (through integrating with access control solutions and managing data protection). They also offer a self-service, business-user-oriented tool, Mica, which provides the self-service front end for data preparation and curation.
Its new offering, "Data Lake in a Box," is actually a pre-configured implementation of Zaloni Bedrock and Mica on HDP. It is the result of harvesting best practices from Bedrock and Mica engagements with a number of early Zaloni customers. Among the elements in the "Box" is preconfiguring the data ingestion component with connectors to popular data sources, the configuration of a metadata exchange framework. While no two data lakes are the same, getting a jumpstart with a prepackaged template provides a useful tool that can help enterprises get off square one.
Developing a Strategy for Data Lake Governance, IT0014-003113 (May 2016)
"On the Radar: Zaloni develops tooling for managing the data lake," IT0014-003090 (December 2015)
Tony Baer, Principal Analyst, Information Management