From Glitchdata
Revision as of 19:44, 10 July 2019 by Jasonchen (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Apache Spark™ is a fast and general engine for large-scale data processing. Spark provides improvement of 100x over MapReduce by leveraging in-memory processing.

In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]

Spark is written in Scala