DataTorrent, which offers a commercial platform built around the open source Apache Apex streaming engine, has differentiated with an application-centered approach. Its latest release expands the portfolio of streaming application accelerators and has integrated some of the most common analytic capabilities used by its customers: the open source Druid OLAP engine that is useful for monitoring KPIs, a data replay capability that allows more liberal use of what-ifs (and potentially, machine learning), and Python support that eases deployment of analytic and machine learning libraries.
DataTorrent puts focus on streaming application accelerators
Ovum’s last update on the real-time streaming market revealed an increasingly crowded landscape where no single streaming engine has emerged as the de facto standard. This has created an opening for managed cloud services, such as Amazon Kinesis, Microsoft Azure Event Hub, and Google Cloud Dataflow, to fill the void with services for ingesting and processing real-time data. Excluding capabilities such as supporting SQL query, most of these services require customers to build their own applications.
Over the past year, DataTorrent has taken a different route, focusing more on providingapplications to help customers get into production faster to deliver business-critical outcomes. While most streaming engines are toolsets, DataTorrent is focusing on delivering on-ramps to accelerate development of business applications. Last year, it introduced a financial payment fraud “application” – which is essentially a starting point for building the application. The new release just announced, RTS version 3.10, expands on that with two new application accelerators: account fraud and product recommendations (e.g. for providing next-best offer). In the future, we expect that DataTorrent will add similar offers for popular streaming use cases such as cybersecurity.
One of the distinguishing features of the Apex streaming engine (on which DataTorrent is built) is stateful processing, which builds in fault-tolerance and lets you roll back event streams to specific points. On the new 3.10 release, DataTorrent takes it the next step with a replay feature that lets you compare different what-if models on the same stream. This capability could prove useful for training machine-learning models. Such models should be easier to run thanks to new integration support for Python. Before, you could run Python scoring models or libraries on DataTorrent, but you would have to manage all the interdependencies (such as when to scale a cluster to run a job) manually; the new integration automates runtime management and adds monitoring capabilities that could cover performance and business outcomes (e.g. how many fewer shopping carts are abandoned after a tweak of the user experience). The 3.10 release also adds the open source Druid OLAP engine optimized for real-time operation that is used for displaying KPIs bridging real-time streaming and historical data. With the OLAP engine built in, customers don't need a separate cluster.
While there are numerous streaming engines on the market, DataTorrent’s prime rivals are cloud providers with their own managed services. Like any third-party streaming engine, DataTorrent offers cloud independence and support for running hybrid on premises with cloud; unlike rivals, it is also seeking to go higher up the application stack. A key benchmark for success will be the degree to which customer implementations allow DataTorrent to enrich its application template portfolio.
On the Radar: DataTorrent, aiming to package streaming applications, IT0014-003325 (August 2017)
Apache Kafka: Enterprise Messaging in a Scale-Out World, IT0014-003276 (June 2017)
"Fast data analytics requires even faster governance," IT0014-003158 (October 2016)
Fast Data 2015–16: The Rebirth of Streaming Analytics, IT0014-003064 (October 2015)
Tony Baer, Principal Analyst, Information Management