As one of a new breed of challengers in the data integration market, Trifacta has passed an important milestone in lining up its first major systems integrator partner. Forming a partnership with Infosys could prove to be a watershed event for Trifacta, a provider of data preparation tooling, in establishing critical-mass presence in an area that is a key building block for data lake governance.
Getting feet on the street
As we predicted in our 2012 research on data quality and Hadoop, the volume of data and the varied nature of it would demand new probabilistic, data science–driven approaches to data integration. Over the past 24 months, a new breed of "data preparation" vendors has introduced tooling that applies machine-learning approaches to cleansing data. These vendors have done so with front ends that look more like the Excel spreadsheets to which business users are accustomed than the schema diagrams prevalent with traditional ETL tools. Trifacta is one of the new breed for an area where established data integration players like IBM, Informatica, and Oracle are stepping in. A key challenge for any new challenger is getting the right channels to get enough feet on the ground. Until now, Trifacta has been successful in securing partnerships with Hadoop platform providers. And it has also signed up a couple of BI analytics tool providers: Tableau, Qlik, and Zoomdata. What's missing is a global systems integrator to multiply the skills base and get real feet on the street to implement the Trifacta tool and cajole business end users to believe that they can use self-service, not only to visualize data, but also curate it.
The announcement of a new partnership with Infosys is therefore a key milestone for Trifacta. If the India-based outsourcing and systems integration player executes on the promise, Trifacta's presence could dramatically expand. The partnership will establish "facts on the ground" before established players claim the market space. That's hurdle number one. As we identify in our forthcoming report on data lake governance, "data inventory" is a critical cornerstone. Today, data prep providers like Trifacta perform some but not all the necessary tasks: profiling, cleaning, and merging data sets, and providing some cataloging capabilities. Trifacta partners with cataloging providers Waterline and Alation, and with Cloudera Navigator. But we believe that ultimately integrated, rather than best-of-breed, approaches must prevail. The next step for the Trifactas of the world is adding related tasks for matching, de-duplicating, tagging, and supporting "emergent' approaches to master data so enterprises do not have to juggle and integrate multiple point tools; this may come through organic development, or more likely, acquisition. If Infosys provides the global critical mass that Trifacta needs, it would be an important first step toward building the momentum for taking the second.
Data Quality and Big Data: From Discovery to Precision, IT014-002596 (May 2012)
2016 Trends to Watch: Big Data, IT0014-003083 (December 2015)
Tony Baer, Principal Analyst, Information Management