Data preparation and the process to extract, transform, and load (ETL) data for use in analytics manipulation both represent a growing pain in the market for data management. For the large enterprise, this process is difficult enough; the big data era necessitates ingestion and quality control of a vast variety and volume of data types that must frequently be standardized for cross analysis, often requiring a team of data scientists. But for smaller organizations that cannot maintain a full team of dedicated professionals and tools for ETL, this task is even more daunting. To accelerate the move into a true analytics-driven business model and increase access to data, these groups need more intuitive ETL and data prep tools in order to democratize the data quality process among knowledge workers.
Data veracity drives business value, and ETL is a bottleneck
Enterprise size can often be used as a rough proxy for the volume of data being leveraged on a regular basis. Generally, larger sets of data provide more robust statistical findings and insight … but only if the data is accurate to begin with. For smaller organizations dealing with smaller sets of data, the challenge of data quality is further compounded, and ETL presents a major bottleneck. To bridge the gap from reactive ad hoc analysis of data to more fluid and iterative use of data, these businesses need ETL capabilities that can be implemented by a more diverse audience of users.
The pressure to extend analytics tool functionality to a wider, less technical user audience is a natural consequence of the rise of self-serve analytics models. However, an increase in the number of “casual” business users of analytics also necessitates an increase in vigilance towards data quality and security, because more decisions are being made on the basis of the findings. Ovum believes that as businesses try to advance towards a holistic, governed data lake approach, there is a need for more user-friendly ETL features so that the data lake itself can be more quickly populated with validated content. With ETL and data preparation often acting as the middle infrastructure layer between data collection and data use, ease of use would create more transparency, speed, and value for the entire process.
Paige Bartley, Senior Analyst, Information Management