New TR Released: NetApp ONTAP SAN Solution for Hadoop | TR-4697

Jeff_Yao · ‎2018-07-13

1 Challenges of Big Data Analytics

Big data requires big analytics, big bandwidth, and big content technologies. Organizations leverage big data by collecting and analyzing large amounts of raw data, such as point-of-sale data, credit card transactions, log files, and machine and security data. To harness the maximum value of big data, organizations must transform their raw data into valuable business information. This essential business driver requires tools that can process very large amounts of both structured and unstructured data. It also requires data management software that is capable of hybrid web-scale deployments, is highly available and resilient, and provides the data management capabilities that enterprises require.

Apache Hadoop and its growing ecosystem of products enable organizations to extract valuable insights from large volumes of diverse data that cannot be analyzed with relational databases. With these insights, people across the organization can ask the right questions and get better answers, supporting more informed decisions that help promote business transformation.

However, because initial Apache Hadoop/Spark deployments often rely on commodity servers with internal drives, infrastructure resilience and agility issues prevent organizations from realizing the full benefits of these deployments. For example, a single-disk failure can degrade performance of the entire cluster. Managing disk replacements is continual and error-prone. In addition, triple file replication and failure redistribution models increase network costs and complexity.

For many enterprise customers, Hadoop is a shared-nothing, massively parallel data-processing platform that emphasizes cost effectiveness and replication of data availability. Most enterprises have used traditional databases to manage their rapidly growing need to process data and extract business value. Sometimes, however, the costs of these solutions have been so high that they cannot afford to either house or process the data. In other cases, the solutions are not suitable for new data types such as unstructured data or key value pairs.

For more info, please check here