Big Data – A Practical Approach to All the Hype

If you haven't noticed Big Data has created a lot of buzz lately.  Much of the buzz is from the absolute wow factor of how big is big.  With the number of smart phones nearing 6 billion all creating content, Facebook generating over 30 billion pieces of content a month and data expected to grow at 40% year on year it's easy to imagine big really is BIG.


Most of this exploding data growth is unstructured machine-generated and user-generated data. Digital technologies are moving to denser media, photos have all gone digital, videos are using higher resolution, and advanced analytics require more storage. Furthermore, machine-generated data from sensor networks, buyer behavior tracking, and other sources contribute to much larger datasets that need to be understood and commercialized. In short, the amount of data is increasing and the data objects themselves are getting bigger.


In fact the digital universe has recently broken the zettabyte barrier which is approximately equal to a thousand exabytes or a billion terabytes.  How big is that?  To give you an idea of scale it would take everyone on the planet posting to Twitter 7*24 for 100 years to generate a zettabybe.


So you get the idea - it’s really big. But so what?   As an IT organization why should I care?  Well there are two big reasons:


  1. Business Advantage - In all this mountain of data there is real business value. The opportunity is to be able to store hours, days, months and years of surveillance video and be able to find the whereabouts and actions of a single person immediately upon request. Like identifying terrorist as soon as they enter an airport or finding known cheats as they enter a casino.
    Other big data examples involve gaining insight from very large data sets to identify trends and match them to real-time events. Things like being alerted to your customer ordering 300 times more that they usually do so that you can re-route inventory to satisfy their need.  This requires analytics to know what they normally order and real-time alerts to events that are abnormal. Similarly banks need pattern recognition in real-time to detect fraud.
    Or a large retailer analyzing their transactional data together with weather forecasts to anticipate where show shovels need to be delivered ahead of a storm or where fans need to be delivered ahead of a heat wave.
  2. Cost of Compliance - At some point it will break your budget. Conformance requirements mean you need to keep an ever-growing amount of data.  Eventually you will have to think differently and keep more while spending less. Compliance to new laws may require data be kept forever and be retrieved immediately when required. Additionally as the existing infrastructure scales the complexity of managing and protecting the data becomes impractical.


As an IT organization you may be thinking that your own data growth will soon be stretching the limits of your infrastructure. A way to define big data is to look at your existing infrastructure, the amount of data you have now, and the amount of growth you're experiencing.  Is it starting to break your existing processes? If so, where?  At NetApp we’ve taken a practical approach to helping our customers with their Big Data challenges.  It’s not about some future unknown state that requires retraining your staff with new competencies or changing the way you do business but about what can you do today that can make a real difference.


As such we have focused out Big Data solution portfolio at 3 specific use cases that offer alternatives that customers can act on today.  We call these cases the ABCs of Big Data - analytics, bandwidth, and content.  Each area has its own specific challenges and unique infrastructure requirements.



  • Analytics. This solution area focuses on providing efficient analytics for extremely large datasets.  Analytics is all about gaining insight, taking advantage of the digital universe, and turning data into high-quality information, providing deeper insights about the business to enable better decisions.
  • Bandwidth. This solution area focuses on obtaining better performance for very fast workloads.  High-bandwidth applications include high-performance computing: the ability to perform complex analyses at extremely high speeds; high-performance video streaming for surveillance and mission planning; and as video editing and play-out in media and entertainment.
  • Content. This solution area focuses on the need to provide boundless secure scalable data storage. Content solutions must enable storing virtually unlimited amounts of data, so that enterprises can store as much data as they want, find it when they need it, and never lose it.


To summarize - behind the hype there are multiple opportunities. You need to be asking, where are the opportunities where can I take advantage of my data? What are the insights that can really help my business?  Where are the places I can use my data to competitive advantage?  Can you link the trends in buying patterns to people's physical location at a point in time to give them a better experience? Can you detect when fraud is about to happen? Can you find the likely hotspots for failure before they fail?

Your universe of data can be a gold mine. Can you find the value and turn it into real business advantage.  If you don't you can be sure your competitor is.