From Glitchdata
Jump to: navigation, search

Apache Spark™ is a fast and general engine for large-scale data processing. Spark provides improvement of 100x over MapReduce by leveraging in-memory processing.

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]

Spark is written in Scala