Matei Zaharia
b267175ab5
Synchronization fix in case SparkContext is used from multiple threads.
2012-02-06 14:28:18 -08:00
haoyuan
b72d93a0da
Test commit
2012-02-06 09:58:06 -08:00
Matei Zaharia
43a3335090
Simplifying test
2012-02-05 22:46:51 -08:00
Matei Zaharia
7449ecfb7e
Merge branch 'master' of github.com:mesos/spark
2012-01-31 00:33:24 -08:00
Matei Zaharia
100e800782
Some fixes to the examples (mostly to use functional API)
2012-01-31 00:33:18 -08:00
Matei Zaharia
72d2489b6d
Merge pull request #108 from patelh/master
...
Added immutable map registration in kryo serializer
2012-01-30 16:31:12 -08:00
Hiral Patel
b47952342e
Add register immutable map to kryo serializer
2012-01-26 15:24:20 -08:00
Matei Zaharia
fabcc82528
Merge pull request #103 from edisontung/master
...
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Matei Zaharia
fd5581a0d3
Fixed a failure recovery bug and added some tests for fault recovery.
2012-01-13 19:17:27 -08:00
Matei Zaharia
eb05154b7a
Fixed a failure recovery bug and added some tests for fault recovery.
2012-01-13 19:08:25 -08:00
Edison Tung
1ecc221f84
Fixed bugs
...
I've fixed the bugs detailed in the diff. One of the bugs was already
fixed on the local file (forgot to commit).
2012-01-09 11:59:52 -08:00
Matei Zaharia
e269f6f7ea
Register RDDs with the MapOutputTracker even if they have no partitions.
...
Fixes #105 .
2012-01-05 15:59:20 -05:00
Matei Zaharia
5fd101d79e
Add dependency on Akka and Netty
2011-12-15 13:21:14 +01:00
Matei Zaharia
3034fc0d91
Merge commit 'ad4ebff42c1b738746b2b9ecfbb041b6d06e3e16'
2011-12-14 18:19:43 +01:00
Matei Zaharia
6a650cbbdf
Make Spark port default to 7077 so that it's not an ephemeral port that might be taken
2011-12-14 18:18:22 +01:00
Matei Zaharia
735843a049
Merge remote-tracking branch 'origin/charles-newhadoop'
2011-12-02 21:59:30 -08:00
Charles Reiss
66f05f383e
Add new Hadoop API reading support.
2011-12-01 14:02:10 -08:00
Charles Reiss
02d43e6986
Add new Hadoop API writing support.
2011-12-01 14:01:28 -08:00
Matei Zaharia
72c4839c5f
Fixed LocalFileLR to deal with a change in Scala IO sources
...
(you can no longer iterate over a Source multiple times).
2011-12-01 13:52:12 -08:00
Edison Tung
42f8847a21
Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD
2011-12-01 13:43:25 -08:00
Edison Tung
de01b6deaa
Fixed bug in RDD
...
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Edison Tung
e1c814be4c
Renamed SparkLocalKMeans to SparkKMeans
2011-12-01 13:34:03 -08:00
Matei Zaharia
22b8fcf632
Added fold() and aggregate() operations that reuse an object to
...
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95 .
2011-11-30 11:37:47 -08:00
Matei Zaharia
09dd58b3a7
Send SPARK_JAVA_OPTS to slave nodes.
2011-11-30 11:34:58 -08:00
Edison Tung
a3bc012af8
added takeSamples method
...
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Edison Tung
3b9d9de583
Added KMeans examples
...
LocalKMeans runs locally with a randomly generated dataset.
SparkLocalKMeans takes an input file and runs KMeans on it.
2011-11-21 16:37:58 -08:00
Ankur Dave
ad4ebff42c
Deduplicate exceptions when printing them
...
The first time they appear, exceptions are printed in full, including
a stack trace. After that, they are printed in abbreviated form. They
are periodically reprinted in full; the reprint interval defaults to 5
seconds and is configurable using the property
spark.logging.exceptionPrintInterval.
2011-11-14 01:54:53 +00:00
Ankur Dave
35b6358a7c
Report errors in tasks to the driver via a Mesos status update
...
When a task throws an exception, the Spark executor previously just
logged it to a local file on the slave and exited. This commit causes
Spark to also report the exception back to the driver using a Mesos
status update, so the user doesn't have to look through a log file on
the slave.
Here's what the reporting currently looks like:
# ./run spark.examples.ExceptionHandlingTest master@203.0.113.1:5050
[...]
11/10/26 21:04:13 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
11/10/26 21:04:13 INFO spark.SimpleJob: Loss was due to java.lang.Exception: Testing exception handling
[...]
11/10/26 21:04:16 INFO spark.SparkContext: Job finished in 5.988547328 s
2011-11-14 01:54:53 +00:00
Matei Zaharia
07532021fe
Bug fix: reject offers that we didn't find any tasks for
2011-11-08 23:05:54 -08:00
Matei Zaharia
13f6900ee6
Merge branch 'master' of github.com:mesos/spark
2011-11-08 21:46:03 -08:00
Matei Zaharia
c7d6f1a65c
Really upgrade to SBT 0.11.1 (through build.properties and plugin changes)
2011-11-08 21:45:29 -08:00
Ankur Dave
c5be7d2b22
Update Bagel unit tests to reflect API change
2011-11-08 19:56:44 +00:00
Matei Zaharia
9e4c79a4d3
Closure cleaner unit test
2011-11-08 00:40:15 -08:00
Matei Zaharia
f346e64637
Updates to the closure cleaner to work better with closures in classes.
...
Before, the cleaner attempted to clone $outer objects that were classes
(as opposed to nested closures) and preserve only their used fields,
which was bad because it would miss fields that are accessed indirectly
by methods, and in general it would confuse user code. Now we keep a
reference to those objects without cloning them. This is not perfect
because the user still needs to be careful of what they'll carry along
into closures, but it works better in some cases that seemed confusing
before. We need to improve the documentation on what variables get
passed along with a closure and possibly add some debugging tools for it
as well.
Fixes #71 -- that code now works in the REPL.
2011-11-08 00:33:28 -08:00
Matei Zaharia
7fd05cbb8f
Update to SBT 0.11.1
2011-11-07 20:28:08 -08:00
Matei Zaharia
63da22c025
Update REPL code to use our own version of JLineReader, which fixes #89 .
...
I'm not entirely sure why this broke in the jump from Scala 2.9.0.1 to
2.9.1 -- maybe something about name resolution changed?
2011-11-07 20:16:25 -08:00
Matei Zaharia
3fad5e580f
Fix Scala version requirement in README
2011-11-03 22:42:36 -07:00
Matei Zaharia
c2b7fd6899
Make parallelize() work efficiently for ranges of Long, Double, etc
...
(splitting them into sub-ranges). Fixes #87 .
2011-11-02 15:16:02 -07:00
Matei Zaharia
d4c8e69dc7
K-means example
2011-11-01 19:25:58 -07:00
Matei Zaharia
157279e9eb
Update Spark to work with the latest Mesos API
2011-10-30 14:10:56 -07:00
root
3a0e6c4363
Miscellaneous fixes:
...
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
root
49505a0b0b
Switched Jetty to version 7.5 because 8.0 was causing a conflict with the log4j and Jetty libraries in Hadoop.
2011-10-17 18:06:41 +00:00
root
62aa820084
Merge branch 'ankur-master'
2011-10-14 02:14:07 +00:00
Ankur Dave
ab3889f627
Implement standalone WikipediaPageRank with custom serializer
2011-10-09 16:53:10 -07:00
Ankur Dave
cbdc01eecd
Update WikipediaPageRank to reflect Bagel API changes
2011-10-09 16:19:34 -07:00
Ankur Dave
6d707f6b63
Remove ShortestPath for now
2011-10-09 16:19:34 -07:00
Ankur Dave
0028caf3a4
Simplify and genericize type parameters in Bagel
2011-10-09 15:58:39 -07:00
Ankur Dave
2d7057bf5d
Implement PairRDDFunctions.partitionBy
2011-10-09 15:52:09 -07:00
Ankur Dave
06637cb69e
Fix PairRDDFunctions.groupWith partitioning
...
This commit fixes a bug in groupWith that was causing it to destroy
partitioning information. It replaces a call to map with a call to
mapValues, which preserves partitioning.
2011-10-09 15:48:46 -07:00
Ankur Dave
2911a783d6
Add custom partitioner support to PairRDDFunctions.combineByKey
2011-10-09 15:47:20 -07:00