Navigating the Big Data Labyrinth One Step at a Time

Here’s the good news: the federal government is increasingly aware of the challenges posed by big data. In March 2012, the administration announced a $200 million initiative to address big data concerns in a variety of contexts. The new initiative will help government agencies to develop the cutting-edge technology necessary for collecting, storing, organizing, analyzing, and sharing mega quantities of information.

Here’s the bad news: this funding is a drop in the bucket compared to how much the government will need to spend on big data in the future. Increasingly agencies need to make use of information that is not easily stored in traditional databases: data from sensors, blogs, pictures, emails, social networking, video, etc.  In fact, 90% of data is non-relational and does not lend itself to databases.  On the other hand, if you can integrate the data into your organization’s processes, it can help you answer new questions, increase staff productivity, solve problems, and streamline service provision.

There are many different ways to use big data, but it basically boils down to a four part process: 1. Ingest, 2. Analysis, 3. Storage, and 4. Distribution.  Each step is essential for the effective use of your data. However, as the volume and velocity of the data increases, each of these steps grows more complicated and difficult to execute. Many government organizations find their data management process breaking down in one or more of these steps. 

The traditional methodology for handling data was that an organization’s IT people would build schemas to handle each new type of data, making sure that it was useful to the organization. However, that approach doesn’t work anymore. You need a system that can support decisions made against unstructured data. But, the volume and velocity of today’s information doesn’t give you the time to build new schemas.  Instead you need to do it on the fly.

Government IT personnel know they can’t ignore the problems created by big data and know that they need help. But where do you start? The technology is happening now, but you need to fit any answer into a two-year budgeting cycle – with the understanding that a complete solution probably won’t happen for three or four years. These realities can make the problem seem overwhelming.

However, you don’t need to start big; you can start small. It’s not possible to address every step in the process at the same time, so start with the stage that has the biggest impact on your agency’s mission. Are you having trouble organizing and analyzing your information? Is your data storage inadequate for your needs? Are you able to get data to your stakeholders when they need it? If you ask yourself these questions, you will know where your area of biggest need lies.

If you don’t have the budget to address the problem as a whole, one possibility is budgeting for a pilot program now. Take one part of your data – a specific program or set of information – and budget in a program that will help organize one phase of that data’s process. The pilot can include process flow studies and assessment studies that will help you determine where the information bottlenecks occur and which parts of the process need the most work. This approach can help you start small and build on your successes, addressing your big data needs in a phased approach which is easier on your budget.

Such a pilot program can also help you figure out which vendors can meet your big data needs. You don’t want your data to become a science experiment in the hands of new companies with untested approaches to data management.  You want proven, tested solutions. When you’re looking for help in navigating this big data labyrinth, look for a partner that can provide pre-sized, pre-tested solutions for your Hadoop & NoSQL databases. An organization that understands the ecosystem of partners to work with – and is not limited to just one. Take advantage of the best practices that allow you to just plug it in and it works. You can avoid months of sizing and testing processes just to get your information to work for you. The big data labyrinth can be intimidating, but you don’t have to face it alone.

Carina Veksler, Product Marketing Manager, NetApp U.S. Public Sector