Commit graph

650 commits

Author SHA1 Message Date
Matei Zaharia 8f5ed51234 Delete Spark's temporary directories when the JVM exits. 2012-02-09 22:58:24 -08:00
Matei Zaharia c0a0df3285 Made the default cache BoundedMemoryCache, and reduced its default size 2012-02-09 22:32:02 -08:00
Matei Zaharia a766780f4c Added some tests for multithreaded access to Spark. 2012-02-09 22:27:53 -08:00
Matei Zaharia 0e93891d3d Replaced LocalFileShuffle with a non-singleton ShuffleManager class
and made DAGScheduler automatically set SparkEnv.
2012-02-09 22:14:56 -08:00
Matei Zaharia e02dc83a5b IO optimizations 2012-02-06 20:40:39 -08:00
Matei Zaharia c40e766368 Use java.util.HashMap in shuffles 2012-02-06 19:20:25 -08:00
Matei Zaharia d6ec664b48 Add dependency on fastutil and update Guava 2012-02-06 15:37:27 -08:00
Matei Zaharia b267175ab5 Synchronization fix in case SparkContext is used from multiple threads. 2012-02-06 14:28:18 -08:00
Matei Zaharia 43a3335090 Simplifying test 2012-02-05 22:46:51 -08:00
Matei Zaharia 7449ecfb7e Merge branch 'master' of github.com:mesos/spark 2012-01-31 00:33:24 -08:00
Matei Zaharia 100e800782 Some fixes to the examples (mostly to use functional API) 2012-01-31 00:33:18 -08:00
Matei Zaharia 72d2489b6d Merge pull request #108 from patelh/master
Added immutable map registration in kryo serializer
2012-01-30 16:31:12 -08:00
Hiral Patel b47952342e Add register immutable map to kryo serializer 2012-01-26 15:24:20 -08:00
Matei Zaharia fabcc82528 Merge pull request #103 from edisontung/master
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia fd5581a0d3 Fixed a failure recovery bug and added some tests for fault recovery. 2012-01-13 19:17:27 -08:00
Matei Zaharia eb05154b7a Fixed a failure recovery bug and added some tests for fault recovery. 2012-01-13 19:08:25 -08:00
Edison Tung 1ecc221f84 Fixed bugs
I've fixed the bugs detailed in the diff. One of the bugs was already
fixed on the local file (forgot to commit).
2012-01-09 11:59:52 -08:00
Matei Zaharia e269f6f7ea Register RDDs with the MapOutputTracker even if they have no partitions.
Fixes #105.
2012-01-05 15:59:20 -05:00
Matei Zaharia 5fd101d79e Add dependency on Akka and Netty 2011-12-15 13:21:14 +01:00
Matei Zaharia 3034fc0d91 Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16' 2011-12-14 18:19:43 +01:00
Matei Zaharia 6a650cbbdf Make Spark port default to 7077 so that it's not an ephemeral port that might be taken 2011-12-14 18:18:22 +01:00
Matei Zaharia 735843a049 Merge remote-tracking branch 'origin/charles-newhadoop' 2011-12-02 21:59:30 -08:00
Charles Reiss 66f05f383e Add new Hadoop API reading support. 2011-12-01 14:02:10 -08:00
Charles Reiss 02d43e6986 Add new Hadoop API writing support. 2011-12-01 14:01:28 -08:00
Matei Zaharia 72c4839c5f Fixed LocalFileLR to deal with a change in Scala IO sources
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung 42f8847a21 Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD 2011-12-01 13:43:25 -08:00
Edison Tung de01b6deaa Fixed bug in RDD
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Edison Tung e1c814be4c Renamed SparkLocalKMeans to SparkKMeans 2011-12-01 13:34:03 -08:00
Matei Zaharia 22b8fcf632 Added fold() and aggregate() operations that reuse an object to
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95.
2011-11-30 11:37:47 -08:00
Matei Zaharia 09dd58b3a7 Send SPARK_JAVA_OPTS to slave nodes. 2011-11-30 11:34:58 -08:00
Edison Tung a3bc012af8 added takeSamples method
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Edison Tung 3b9d9de583 Added KMeans examples
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave ad4ebff42c Deduplicate exceptions when printing them
The first time they appear, exceptions are printed in full, including
a stack trace. After that, they are printed in abbreviated form. They
are periodically reprinted in full; the reprint interval defaults to 5
seconds and is configurable using the property
spark.logging.exceptionPrintInterval.
2011-11-14 01:54:53 +00:00
Ankur Dave 35b6358a7c Report errors in tasks to the driver via a Mesos status update
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.

Here's what the reporting currently looks like:

    # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
    [...]
    11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
    11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
    [...]
    11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia 07532021fe Bug fix: reject offers that we didn't find any tasks for 2011-11-08 23:05:54 -08:00
Matei Zaharia 13f6900ee6 Merge branch 'master' of github.com:mesos/spark 2011-11-08 21:46:03 -08:00
Matei Zaharia c7d6f1a65c Really upgrade to SBT 0.11.1 (through build.properties and plugin changes) 2011-11-08 21:45:29 -08:00
Ankur Dave c5be7d2b22 Update Bagel unit tests to reflect API change 2011-11-08 19:56:44 +00:00
Matei Zaharia 9e4c79a4d3 Closure cleaner unit test 2011-11-08 00:40:15 -08:00
Matei Zaharia f346e64637 Updates to the closure cleaner to work better with closures in classes.
Before, the cleaner attempted to clone $outer objects that were classes
(as opposed to nested closures) and preserve only their used fields,
which was bad because it would miss fields that are accessed indirectly
by methods, and in general it would confuse user code. Now we keep a
reference to those objects without cloning them. This is not perfect
because the user still needs to be careful of what they'll carry along
into closures, but it works better in some cases that seemed confusing
before. We need to improve the documentation on what variables get
passed along with a closure and possibly add some debugging tools for it
as well.

Fixes #71 -- that code now works in the REPL.
2011-11-08 00:33:28 -08:00
Matei Zaharia 7fd05cbb8f Update to SBT 0.11.1 2011-11-07 20:28:08 -08:00
Matei Zaharia 63da22c025 Update REPL code to use our own version of JLineReader, which fixes #89.
I'm not entirely sure why this broke in the jump from Scala 2.9.0.1 to
2.9.1 -- maybe something about name resolution changed?
2011-11-07 20:16:25 -08:00
Matei Zaharia 3fad5e580f Fix Scala version requirement in README 2011-11-03 22:42:36 -07:00
Matei Zaharia c2b7fd6899 Make parallelize() work efficiently for ranges of Long, Double, etc
(splitting them into sub-ranges). Fixes #87.
2011-11-02 15:16:02 -07:00
Matei Zaharia d4c8e69dc7 K-means example 2011-11-01 19:25:58 -07:00
Matei Zaharia 157279e9eb Update Spark to work with the latest Mesos API 2011-10-30 14:10:56 -07:00
root 3a0e6c4363 Miscellaneous fixes:
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
root 49505a0b0b Switched Jetty to version 7.5 because 8.0 was causing a conflict with the log4j and Jetty libraries in Hadoop. 2011-10-17 18:06:41 +00:00
root 62aa820084 Merge branch 'ankur-master' 2011-10-14 02:14:07 +00:00
Ankur Dave ab3889f627 Implement standalone WikipediaPageRank with custom serializer 2011-10-09 16:53:10 -07:00