Apache Spark - A unified analytics engine for large-scale data processing
Go to file
Matei Zaharia cd247b7d86 Created common RDD superclass for distributed files and parallel arrays.
This also means that parallel arrays now get all the functionality files
used to have (filter, map, reduce, cache, etc).
2010-06-17 12:49:42 -07:00
src Created common RDD superclass for distributed files and parallel arrays. 2010-06-17 12:49:42 -07:00
third_party New nexus.jar 2010-06-10 22:41:23 -07:00
.gitignore Ignore .DS_Store 2010-06-10 18:08:59 -07:00
alltests Initial commit 2010-03-29 16:17:55 -07:00
lr_data.txt Initial commit 2010-03-29 16:17:55 -07:00
Makefile Added back REPL tests 2010-06-11 10:03:01 -07:00
README Fixed README 2010-06-11 14:55:23 -07:00
run Fixed classpath for tests 2010-06-10 22:36:45 -07:00
spark-executor Initial commit 2010-03-29 16:17:55 -07:00
spark-shell Initial commit 2010-03-29 16:17:55 -07:00

Spark requires Scala 2.8. This version has been tested with 2.8.0RC3.

To build and run Spark, you will need to have Scala's bin in your $PATH,
or you will need to set the SCALA_HOME environment variable to point
to where you've installed Scala. Scala must be accessible through one
of these methods on Nexus slave nodes as well as on the master.

To build Spark and the example programs, run make.

To run one of the examples, use ./run <class> <params>. For example,
./run SparkLR will run the Logistic Regression example. Each of the
example programs prints usage help if no params are given.

Tip: If you are building Spark and examples repeatedly, export USE_FSC=1
to have the Makefile use the fsc compiler daemon instead of scalac.