Big Data, Alive & Well

Big Data is a big deal around here… and everywhere, really.  It’s a fact of life and a part of many businesses that are thriving.  As the Chair of the SNIA Analytics & Big Data Committee, I’m always on the lookout for worthy Big Data events and a gathering of the minds.



NetApp hosted The Hive Think Tank on January 16th, 2013 and began the new year with an exciting panel discussion called “The Elephant Riders”.  This fantastic panel of people consisted of Charles Zedlewski, VP Products, Cloudera, M.C. Srivas, CTO at MapR, Sanjay Radia, Co-Founder HortonWorks, Jonathan Gray, Co-Founder & CTO Continuity, & Dhruba Borthakur, Facebook.   Pretty incredible lineup, right?  The banter back and forth was even more so.



Hadoop has been one of the main foundations of the Big Data revolution. In this discussion, this elite set of technologists who are founders or core-management-team-members of Big Data companies shared their ideas, insight, and predictions for the future.  Some competed in certain areas, and others were neutral players (ie. Jonathan Gray of Continuity, very eloquently calling his company “Switzerland” at one point). They also discussed the relevance (or not) of Hadoop for Big Data experiments in the next decade. The experts took a closer look on how Big Data infrastructure could evolve and who is best positioned to deliver in this area.


The room was packed, the discussion was animated, the audience was engaged, and there was active participation on Twitter and Facebook.  If you’re interested to check out the riveting dialogue and discussion, take a look here. Share it with your friends, colleagues, all of those who weren’t able to make it.



Big Data comes in many different shapes and forms, both structured and unstructured.  Due this variety, companies are innovating and creating new ways to not only analyze the data but to also turn that methodology and features into new products and services.


Until recently, it’s been a huge challenge to control this variety of huge datasets, but with innovations in virtualization, high performance computing, and solid state storage technology, it's now possible to process and manage Big Data in a manner that results in new possibilities that didn’t exist before.


Obviously there is a clear connection between analyzing large data sets and the business success associated. Technology vendors and industry analysts relay the known business benefits that enterprises attach from combining structure and unstructured data.  Where they really win and shine is once they find the business value that is hidden in all of that collected data.  They succeed when they gain visibility into their data and what it means to grow their business.



More and more businesses are learning how to first recognize their data for what it is, and then how to analyze it and apply some value to what was found as part of that analysis.  I would advise our members to maintain a level of control and visibility with their data and use stable and proven tools for their business intelligence.  If these strategies are put in place early on, it will equate to a better chance for success in the long run.  Gaining knowledge around how to manage and analyze data and really understand the leaders in the business intelligence space only adds to their potential success moving forward.



The most important step you can take to prepare for big data and everything that it entails is to align with specific business goals.  Applying big data techniques to accomplish these goals is the underlying task at hand. Whether it is filtering patterns and webpage logs to understand shopping behavior on an eCommerce site or determining interests and demographics tastes and trends from social media interactions, aligning with your specific goals is the first and foremost task for success. From there you can invest in skills, resources, infrastructure, etc. to achieve those goals.


Technology is ever evolving, and one area where it’s growing is in the ability to collect and analyze huge datasets.  It is exactly this trajectory of functionality that will lead to revolutionary advances in business as we know it today.  Furthermore, the big data “trend” is rather a reality and not going anywhere as long as we keep tweeting, updating our Facebook status, posting photos and videos, we are creating massive amounts of data.  There is tons of data all around us, and by tomorrow it will have grown by 2.5 Exabyte’s.  What’s important is that we take this amazing opportunity in front of us and turn into business value and information that will help us succeed as a whole.


It's estimated that 800 exabytes (EB) of data existed in 2010 (of which ~90% was created in the prior two years). In 2012, it was estimated that 2.8 zetabytes (ZB) of total stored data existed (or 2 ZB created in just the prior 2 years). (Given that I am a storage gal, let me break it down for you to really drive home the magnitude here : 1,024 megabytes is 1 gigabyte, 1,024 gigabytes is 1 terabyte, 1,024 terabytes is 1 petabyte, 1,024 petabytes is 1 EB, and 1,024 EB is 1 ZB. The average person is estimated to create ~4.5 EB of data over his/her lifetime.


I’m estimating that most of the readers out there have a Facebook account or know at least a handful that do.  ~80% of inbound data traffic for photo requests to Facebook servers are for only ~10% of the photos stored on the social network and ~95% of photo access requests are for ~50% of the stored photos. Facebook users now upload ~350 million photos per day (up from ~300 million in the Fall of 2012) which requires an additional ~7 petabytes of storage each month.  Now the data is getting really “big” isn’t it?  & we’re only just getting started…




The ABDC is a SNIA Committee that is dedicated to fostering the growth and success of the market for what is generally referred as Analytics and Big Data, and more generally, the use of data storage resources and services by analytics and big data applications and toolsets.