Commit graph

833 commits

Author SHA1 Message Date
Matei Zaharia e269f6f7ea Register RDDs with the MapOutputTracker even if they have no partitions.
Fixes #105.
2012-01-05 15:59:20 -05:00
Matei Zaharia 5fd101d79e Add dependency on Akka and Netty 2011-12-15 13:21:14 +01:00
Matei Zaharia 3034fc0d91 Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16' 2011-12-14 18:19:43 +01:00
Matei Zaharia 6a650cbbdf Make Spark port default to 7077 so that it's not an ephemeral port that might be taken 2011-12-14 18:18:22 +01:00
Matei Zaharia 735843a049 Merge remote-tracking branch 'origin/charles-newhadoop' 2011-12-02 21:59:30 -08:00
Charles Reiss 66f05f383e Add new Hadoop API reading support. 2011-12-01 14:02:10 -08:00
Charles Reiss 02d43e6986 Add new Hadoop API writing support. 2011-12-01 14:01:28 -08:00
Matei Zaharia 72c4839c5f Fixed LocalFileLR to deal with a change in Scala IO sources
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung 42f8847a21 Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD 2011-12-01 13:43:25 -08:00
Edison Tung de01b6deaa Fixed bug in RDD
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Edison Tung e1c814be4c Renamed SparkLocalKMeans to SparkKMeans 2011-12-01 13:34:03 -08:00
Matei Zaharia 22b8fcf632 Added fold() and aggregate() operations that reuse an object to
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95.
2011-11-30 11:37:47 -08:00
Matei Zaharia 09dd58b3a7 Send SPARK_JAVA_OPTS to slave nodes. 2011-11-30 11:34:58 -08:00
Edison Tung a3bc012af8 added takeSamples method
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Edison Tung 3b9d9de583 Added KMeans examples
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave ad4ebff42c Deduplicate exceptions when printing them
The first time they appear, exceptions are printed in full, including
a stack trace. After that, they are printed in abbreviated form. They
are periodically reprinted in full; the reprint interval defaults to 5
seconds and is configurable using the property
spark.logging.exceptionPrintInterval.
2011-11-14 01:54:53 +00:00
Ankur Dave 35b6358a7c Report errors in tasks to the driver via a Mesos status update
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.

Here's what the reporting currently looks like:

    # ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
    [...]
    11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
    11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
    [...]
    11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia 07532021fe Bug fix: reject offers that we didn't find any tasks for 2011-11-08 23:05:54 -08:00
Matei Zaharia 13f6900ee6 Merge branch 'master' of github.com:mesos/spark 2011-11-08 21:46:03 -08:00
Matei Zaharia c7d6f1a65c Really upgrade to SBT 0.11.1 (through build.properties and plugin changes) 2011-11-08 21:45:29 -08:00
Ankur Dave c5be7d2b22 Update Bagel unit tests to reflect API change 2011-11-08 19:56:44 +00:00
Matei Zaharia 9e4c79a4d3 Closure cleaner unit test 2011-11-08 00:40:15 -08:00
Matei Zaharia f346e64637 Updates to the closure cleaner to work better with closures in classes.
Before, the cleaner attempted to clone $outer objects that were classes
(as opposed to nested closures) and preserve only their used fields,
which was bad because it would miss fields that are accessed indirectly
by methods, and in general it would confuse user code. Now we keep a
reference to those objects without cloning them. This is not perfect
because the user still needs to be careful of what they'll carry along
into closures, but it works better in some cases that seemed confusing
before. We need to improve the documentation on what variables get
passed along with a closure and possibly add some debugging tools for it
as well.

Fixes #71 -- that code now works in the REPL.
2011-11-08 00:33:28 -08:00
Matei Zaharia 7fd05cbb8f Update to SBT 0.11.1 2011-11-07 20:28:08 -08:00
Matei Zaharia 63da22c025 Update REPL code to use our own version of JLineReader, which fixes #89.
I'm not entirely sure why this broke in the jump from Scala 2.9.0.1 to
2.9.1 -- maybe something about name resolution changed?
2011-11-07 20:16:25 -08:00
Matei Zaharia 3fad5e580f Fix Scala version requirement in README 2011-11-03 22:42:36 -07:00
Matei Zaharia c2b7fd6899 Make parallelize() work efficiently for ranges of Long, Double, etc
(splitting them into sub-ranges). Fixes #87.
2011-11-02 15:16:02 -07:00
Matei Zaharia d4c8e69dc7 K-means example 2011-11-01 19:25:58 -07:00
Matei Zaharia 157279e9eb Update Spark to work with the latest Mesos API 2011-10-30 14:10:56 -07:00
root 3a0e6c4363 Miscellaneous fixes:
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
root 49505a0b0b Switched Jetty to version 7.5 because 8.0 was causing a conflict with the log4j and Jetty libraries in Hadoop. 2011-10-17 18:06:41 +00:00
root 62aa820084 Merge branch 'ankur-master' 2011-10-14 02:14:07 +00:00
Ankur Dave ab3889f627 Implement standalone WikipediaPageRank with custom serializer 2011-10-09 16:53:10 -07:00
Ankur Dave cbdc01eecd Update WikipediaPageRank to reflect Bagel API changes 2011-10-09 16:19:34 -07:00
Ankur Dave 6d707f6b63 Remove ShortestPath for now 2011-10-09 16:19:34 -07:00
Ankur Dave 0028caf3a4 Simplify and genericize type parameters in Bagel 2011-10-09 15:58:39 -07:00
Ankur Dave 2d7057bf5d Implement PairRDDFunctions.partitionBy 2011-10-09 15:52:09 -07:00
Ankur Dave 06637cb69e Fix PairRDDFunctions.groupWith partitioning
This commit fixes a bug in groupWith that was causing it to destroy
partitioning information. It replaces a call to map with a call to
mapValues, which preserves partitioning.
2011-10-09 15:48:46 -07:00
Ankur Dave 2911a783d6 Add custom partitioner support to PairRDDFunctions.combineByKey 2011-10-09 15:47:20 -07:00
Ankur Dave 6c6e47e3cd Use BufferedOutputStream in ShuffleMapTask 2011-10-09 15:43:31 -07:00
Matei Zaharia 6483b41377 Merge pull request #83 from ijuma/sbt-0.11
Upgrade to SBT 0.11.0
2011-09-30 21:52:18 -07:00
Ismael Juma d76c0fc781 Upgrade to sbt-idea 0.11.0 final. 2011-09-27 23:13:38 +01:00
Ismael Juma 7e92ef9d19 Add workaround for bug in SBT (issue #206). 2011-09-27 00:04:59 +01:00
Ismael Juma 4019305afe Set SCALA_VERSION to 2.9.1 (from 2.9.1.final) to match expectation of SBT 0.11.0 2011-09-26 22:44:41 +01:00
Ismael Juma 3562db6374 Include "spark-" prefix in project name (used when artifact is published). 2011-09-26 22:41:07 +01:00
Ismael Juma 28b5d5a2af Upgrade compress-lzf to 0.8.4. 2011-09-26 22:32:05 +01:00
Ismael Juma 315e55fde3 Upgrade Jetty to 8.0.1. 2011-09-26 22:32:05 +01:00
Ismael Juma ee980439e2 Use scalatest and scalacheck compiled against Scala 2.9.1. 2011-09-26 22:32:05 +01:00
Ismael Juma bd774eb274 Use new layout for plugins definitions (recommended for SBT 0.11) 2011-09-26 22:32:05 +01:00
Ismael Juma e39edcce60 Upgrade to SBT 0.11.0. 2011-09-26 22:24:29 +01:00