Big data has been a hot topic for several years and for good reason. There is value in analyzing unstructured, high volume and massive data sets. However, when I interview candidates that say they want to be data scientists, they focus on the technology and techniques. They forget that the critical thinking and framework used for big data is also important and it is applicable to many types of analytic projects.
It comes down to some very fundamental questions:
- What problem am I trying to solve? Defining the problem up front will keep you grounded as interesting findings may lure you away from your goal.
- What data sources can I use? You want to consider multiple sources to triangulate your results and provided a richer picture of what is happening.
- Have I considered all the possible sources of bias? Bias of all sorts can skew results and must be considered and incorporated into your analysis plan.
- Do I need to use all the data available or will a sample be sufficient? There are times when it is not feasible or necessary to analyze all the data available. However, if you sample, you need to make sure that you are getting sufficient coverage and that your sampling is random.
- How can I validate my data? Validation must be part of your analytics plan, whether you validate one data set against another or at least compare your results to findings from other comparable projects.
- What analytic technique(s) are appropriate? Consider the pros and cons of various techniques and what would be most appropriate given the data and problem at hand.
While it is very tempting to dive straight into the data and analysis. Spending time up front to answer these questions will help you be more efficient.