Difference between revisions of "Spark"

From Glitchdata
Jump to navigation Jump to search
 
Line 2: Line 2:
 
[[Spark]] provides improvement of 100x over [[MapReduce]] by leveraging in-memory processing.
 
[[Spark]] provides improvement of 100x over [[MapReduce]] by leveraging in-memory processing.
  
Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]
+
In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]
  
 
Spark is written in [[Scala]]
 
Spark is written in [[Scala]]

Latest revision as of 19:44, 10 July 2019

Apache Spark™ is a fast and general engine for large-scale data processing. Spark provides improvement of 100x over MapReduce by leveraging in-memory processing.

In contrast to Hadoop's two-stage disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster for certain applications.[1] By allowing user programs to load data into a cluster's memory and query it repeatedly, Spark is well-suited to machine learning algorithms.[2]

Spark is written in Scala


Links

Related