By Kumar Palaniappan, Enterprise Architect at NetApp
Three years back, when I joined NetApp in Jan 2011, I was tasked by my manager as the very first one to take a look at “ASUP”.
To me it sounded as he was asking me to look into something ASAP... Then I realized NetApp has an enterprise system called Auto Support (ASUP), which collects logs and configuration data from systems installed at our customers sites, in order to detect potential problems before they occur. This application is a key differentiator in the NetApp support products portfolio and it was running out of steam.
ASUP collects roughly 1.1 million messages a week, with an average size of the message of 3-5MB. About 40% of the data comes over the weekend. Processing the data in order to find patterns of future failure was taking too long Some queries were taking weeks. We needed to get the results faster, so that we can act on them.
After a couple of days of getting to know the system, the very first thing that came to my mind is why not use Hadoop to solve the scalability, efficiency and the cost. Prior to my findings, the team had a proposal on the table, which required a multi-million dollar budget. I worked with the team and agreed to do a proof-of-concept for several technologies,.
Hadoop stood out compared to the rest of the technologies. We decided to go with Hadoop.
The key aspect of re-architecting the system was scalability, to keep pace with our rapidly growing installed base. Historically, data has doubled every 18 months. Auto-Support is a critical component of NetApp’s vision of a self-aware, self-diagnosing, self-healing and self-optimizing data platform.
The Auto Support system is a unified set of proactive and predictive support tools.
90% of the data is unstructured, collected from from any storage device – even non-NetApp. The ETL process on this data has very tight SLA of 15 minutes for a normal process and 2 minutes for an event driven message with a high priority.
Here’s how it works:
NetApp management software at the customer site sends diagnostics data to a central Hadoop database of more than 200 Billion records. An analytics application incorporating knowledge from NetApp experts turns this data into insights in near-real-time, identifying risks and fixes. Next, these risk exposures and the associated actions are posted to the customer system dashboard.
Customer benefits are many: higher system availability, lower operational risk, better capacity planning, to name a few.
For NetApp, this system reduced time-to-solution by 47-60%, which translated in significant savings and improved customer satisfaction.