Finding relevant information fast has always been a challenge, even more so in today’s growing “oceans” of data. This talk explores the area of real-time analytics and anomalies detection (in particular credit card fraud) using Apache Hadoop as a data platform, Apache Storm for real-time computation, data ingestion and orchestration and Elasticsearch for performing advanced real-time searches. This session will focus on the architectural challenges of bridging batch and real-time systems and how to overcome them, keeping a close eye on performance and scalability. We will cover the architectural topics such as partition strategies, data locality, integration patterns and multi-tenancy.
Presented by Costin Leau at Hadoop Summit North America 2014