What is big data?

I am often asked, what is big data? It happens at holiday parties and even once after a funeral. Certainly there have been large data sets before. So what is different now? Big data commonly refers to data that is so large that you cannot use the typical environments to store and manage it or the typical software to analyze it. In addition to volume, big data is often defined by velocity and variety. Velocity refers to the speed at which the data is available and big data typically includes frequent inputs.  Variety refers to the diversity of sources and formats and big data typically contains unstructured data which is not easily categorized or organized.

The volume of big data requires new thinking about where to put the data.Traditionally, companies kept their data in house, in a data warehouse on an internal server.Now some companies are turning to the cloud, both private and public clouds, to house data because of its size.In addition, the cloud offers flexibility should the needed storage capacity grow.Similarly, the volume and variety of the data may make it impractical to load the data into a database for to do so would require assigning data elements to tables and fields.Some big data may not be easily structured.For example, it could be text messages from online customer service chats.In this case, companies might turn to a parallel programming framework such as MapReduce to capture the data.This enables them to load all the data and then parse the text of the on-line service chats to identify the frequency of words used.For example, how many customers reported a problem with a particular part or described themselves as frustrated.However, you can’t use SAS or SPSS to analyze the data in a MapReduce environment.Further, data mining techniques may be more useful than classical statistics because of the nature of the problem to be solved.Thus, almost everything about big data requires rethinking data and analytic tools.

However, in the end, big data is like all data.   It must generate value. Big data is meaningless unless it enables companies to increase revenue and/or reduce costs by enabling them to identify insights that were previously unavailable. The power of big data is that analysts can explore larger data sets that were impossible to analyze before and delve into unstructured data that was typically ignored because of its non-conforming format.