Data Lifecycle Management for the Data Lake

By Mike McNamara, Sr. Manager, Product Marketing, NetApp 

 

A data lake is a central location in which to store vast amounts of raw data in its native format, including structured, semi-structured, and unstructured data. It is typically built by using Hadoop. A data lake provides a cost-effective, highly scalable architecture for collecting and processing virtually any data format from any source. It enables new business insights and data use cases that were previously unachievable. For example, companies can obtain a complete view of their customers, extend the life of their enterprise data warehouse, or achieve a new level of operational intelligence.

 

Not all data is created equal. Being able to manage the lifecycle of data in the lake, based on age and relevancy, makes the difference between an efficient and useful repository of valuable data assets and a costly one. With a data lake, the most relevant and valuable data is stored in the most efficient and best-performing way possible. Users can quickly search and find the data that they need, reducing the time to insight. Data that has become stale or less relevant can be stored on a much more cost-effective storage tier but still be accessible if it’s needed.

 

NetApp and Zaloni have partnered to extend NetApp’s vision of the Data Fabric to include data lifecycle management for the data lake. Rather than being constrained to the limitations of HDFS for data lake storage, with NetApp, organizations can define logical data lakes that include HDFS (NetApp® EF-Series and E-Series storage systems) and object (NetApp StorageGRID® Webscale) storage options. This innovative hybrid approach allows data lakes to be defined on or off the premises or a combination of both.

 

Zaloni.jpg

 

Zaloni’s Bedrock Data Lake Management Platform provides end-to-end management and governance of data in the data lake. Its data lifecycle management feature gives organizations the ability to define and execute data lifecycle policies to manage and move data across NetApp storage tiers. Policies can be defined by the age, or relevancy, of data, based on an organization’s data lake needs. By simplifying and automating these activities, the enterprise can focus its time and resources on building the insights and analytics that promote its business. 

 

The joint solution from NetApp and Zaloni can simplify data lake management, improve infrastructure efficiency, and help organizations meet future storage needs and challenges. For more information about the joint solution, view this webinar.