Despite the marketing hype of the big data era, many organizations take a decidedly small approach toward analytics: identifying their highest-value subset of data before investing analysis resources. This made sense in the early days of the Internet, when relatively slow and low-volume human activity was the driver of business value. But with the rise of IoT and connected devices, businesses are looking to new ways to scale and leverage data that is increasingly disparate and time-sensitive – without disregarding the historical data they've invested time in governing.
Analytics scale and analytics structure are at a tipping point
The prevailing model in analytic scaling has followed the 80/20 rule: there are large insight gains from little initial effort, with rapidly diminishing returns once the scale of data reaches a certain level. In other words, most of the value of big data to date, in fact, has been extracted from a relatively small subset of data. The rest of that data – the "dregs" and detritus – has had value only in aggregate. Until very recently, the sheer scale of this aggregate content was not worth the investment in processing power; net peak value of content hit a peak before maximum effort. However, the availability of commodity hardware and horizontally scaling algorithms are changing this, meaning that this previously underutilized data may now hold immense value that will simply require scale to benefit from rather than particularly advanced learning or analysis. The failure to invest now in adequate supporting architecture may cause organizations to fall behind in their analytics initiatives as these changes compound and accelerate.
How does the Internet fit into all of this? It, too, is undergoing a shift from high-value but relatively low-volume sets of data to low-value, high-volume sets. This is seen most predominately in the "last mile" furthest from the Internet backbone, which traditionally was made up of mostly laptops and PCs that provided users with access to the Internet. What was once mostly interactive human activity driven by consumption of sites, links, and purchases is now shifting toward the streaming activity of low-power automated sensors that do not, in isolation, generate significant dollar activity on their own. The result is that amidst the growing volumes of data, companies are also striving to adapt to a shifting cost structure of Internet activity: it is no longer sufficient to invest heavily in shaping last-mile activity to drive value. Behavioral influence of purchasers and site visitors is no longer the sole holy grail of profitability. Instead, investment in Internet infrastructure upstream to handle the scale and real-time nature of these data streams must be equally considered, and aggregate real-time analytics based on streams of geographically distributed sensor data is becoming more important.
This isn't all to say that connected human activity is waning in importance. A holistic big data strategy, in fact, will naturally have to account for big data in all its diversity. Information management architecture today needs to underscore the speed, scale, and compatibility of today's data, while ensuring that historical data is maintained in a governed state so that it can provide contextual insight.
2017 Trends to Watch: Analytics, IT0014-003163 (November 2016)
Fast Data 2015–16: Understanding Streaming Analytics, IT0014-003063 (October 2015)
Paige Bartley, Senior Analyst, Information Management