The Apache Hadoop framework, with its open source methodology and thriving ecosystem of supported projects, is often the preferred platform for analyzing large volumes of diverse data for business insight. The recent Big Data Everywhere San Francisco conference underscored two of Hadoop's most prominent virtues: its flexibility and speed. While the theme of the conference was analytics that can be performed using Hadoop, the concerns over real-time analytics pertain to any fast data platform or target capable of ingesting, transforming, and analyzing data in real time. Organizations within regulated industries, for example, need to exercise caution with applications or use cases that leverage nearly instantaneous insight. For fast data analytics and insight, an updated and flexible policy framework is needed to accommodate the discovery of potentially noncompliant trends and information.
Real-time analytics needs supervision
Today, the need for compliance-focused supervision over fast data analytics projects is emerging as a new business concern not only for regulated organizations, but also organizations that harbor concerns about inappropriate access to data or misuse of analytics. Data can be ingested, analyzed, and visualized in near-real time – leaving little chance for compliance practitioners to interpret or react to trends revealed by analysis. Instead of increasing dependence on compliance tools and technologies, regulated firms would be best advised to focus on developing a compliance policy framework that can scale with data. Realistically, this means establishing channels of communication and routine opportunities for compliance teams to review analytics initiatives, the data they are based on, and the insights they produce.
This is relevant today due to a timely convergence of factors: propagation of unstructured data sources, sophistication of natural language processing, consumerization of analytics tools, and the legal expectation that businesses should be held accountable for their data, regardless of where it resides. This means that compliance "red flags" might go completely unseen by compliance personnel. For instance, a marketing department using a Hadoop-based system to analyze social media may uncover a trend of public complaints regarding a dangerous product failure. If the compliance team is not made aware of this and appropriate actions are not taken in a timely manner, the business may be exposed to expensive civil litigation and/or settlements.
The modern technology landscape, and particularly open source technology platforms like Hadoop, makes it possible to analyze data and implement automated decision-making at a speed that far outpaces the human ability to interpret the information. Therein lies the problem for risk management and compliance teams, whose authority lies in their ability to override and enforce the policy settings which are often carried out with automated technology. Traditionally, unstructured data types subject to compliance monitoring could be captured and handled by point tools, such as those designed for email. But with the proliferation of data types and the push to cross-analyze content from different sources, there is now the risk that "red flag" patterns – of particular interest to compliance managers – may now exist outside of the systems originally designed to enforce regulatory requirements.
Paige Bartley, Senior Analyst, Information Management