Apache Spark™ is a fast and general engine for large-scale data processing. Spark provides improvement of 100x over MapReduce by leveraging in-memory processing.

In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]

Spark is written in Scala