Apache Spark • We use Apache Spark with Scala • A fast and general engine for large-scale data processing (Big Data) • API: – Functional (Scala-like) • map, flatMap, filter, sort – Relational (SQL-like) • select, where, groupBy, join • Distributed – A Driver node submits work to Executor nodes