Charles Reiss
d0588bd6d7
Catch/log errors deleting temp dirs
2013-02-19 13:04:06 -08:00
Charles Reiss
687581c3ec
Paranoid uncaught exception handling for exceptions during shutdown
2013-02-19 13:03:02 -08:00
Reynold Xin
19d3b059e3
Merge branch 'master' into graph
2013-02-19 12:44:05 -08:00
Reynold Xin
81c4d19c61
Maven and sbt build changes for SparkGraph.
2013-02-19 12:43:13 -08:00
Tathagata Das
b0565ae396
Merge pull request #481 from pwendell/stream-rdd-type-streaming
...
STREAMING-51: Add RDD type as a type parameter in JavaDStreamLike Edit (streaming/ version)
2013-02-19 10:25:36 -08:00
Patrick Wendell
041c19e5f0
Small changes that were missing in merge
2013-02-19 08:44:20 -08:00
Patrick Wendell
fed1122d74
Use RDD type for slice
operator in Java.
...
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell
35880de42e
Use RDD type for transform
operator in Java.
...
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions
1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD
whereas previously only (1) and (3) were possible.
Conflicts:
streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell
9d49a6b03f
Use RDD type for foreach
operator in Java.
2013-02-19 08:30:32 -08:00
Nick Pentreath
8a281399f9
Streaming example using Twitter Algebird's Count Min Sketch monoid
2013-02-19 17:56:02 +02:00
Nick Pentreath
d8ee184d95
Dependencies and refactoring for streaming HLL example, and using context.twitterStream method
2013-02-19 17:42:57 +02:00
Prashant Sharma
8d44480d84
example for demonstrating ZeroMQ stream
2013-02-19 19:42:14 +05:30
Prashant Sharma
f7d3e309cb
ZeroMQ stream as receiver
2013-02-19 19:32:52 +05:30
Nick Pentreath
315ea069e8
Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird
...
Conflicts:
project/SparkBuild.scala
2013-02-19 13:58:05 +02:00
Nick Pentreath
015893f0e8
Adding streaming HyperLogLog example using Algebird
2013-02-19 13:21:33 +02:00
Tathagata Das
8b9c673fce
Merge pull request #476 from tdas/streaming
...
Major modifications to fix driver fault-tolerance with file input stream
2013-02-19 03:07:10 -08:00
Tathagata Das
7e30c46aaf
Added comment to the KafkaWordCount, given by Sean McNamara.
2013-02-19 03:05:44 -08:00
Tathagata Das
7851b34e97
Merge branch 'mesos-streaming' into streaming
2013-02-19 03:01:15 -08:00
Tathagata Das
9e82be1503
Merge branch 'streaming' into ScrapCodes-streaming-actor
...
Conflicts:
docs/plugin-custom-receiver.md
streaming/src/main/scala/spark/streaming/StreamingContext.scala
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Matei Zaharia
03d847999e
Merge pull request #477 from shivaram/ganglia-port-change
...
Ganglia port change
2013-02-18 20:25:48 -08:00
haitao.yao
7c129388fb
Merge branch 'mesos'
2013-02-19 11:22:24 +08:00
Shivaram Venkataraman
6cba5a48b0
Print cluster url after setup completes
2013-02-18 18:30:36 -08:00
Shivaram Venkataraman
e7cdf7a6a4
Print ganglia url after setup
2013-02-18 17:15:22 -08:00
Shivaram Venkataraman
03f45a18d5
Use port 5080 for httpd/ganglia
2013-02-18 16:56:01 -08:00
Tathagata Das
12ea14c211
Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.
2013-02-18 15:18:34 -08:00
Tathagata Das
6a6e6bda57
Merge branch 'streaming' into ScrapCode-streaming
...
Conflicts:
streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das
8ad561dc7d
Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.
2013-02-18 02:12:41 -08:00
Matei Zaharia
7151e1e4c8
Rename "jobs" to "applications" in the standalone cluster
2013-02-17 23:23:08 -08:00
Matei Zaharia
06e5e6627f
Renamed "splits" to "partitions"
2013-02-17 22:13:26 -08:00
Matei Zaharia
455d015076
Clean up EC2 script options a bit
2013-02-17 16:53:12 -08:00
Tathagata Das
f98c7da23e
Many changes to ensure better 2nd recovery if 2nd failure happens while
...
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
ensures that a new checkpoint is written as soon as at least one batch
gets computed while recovering from a failure. This ensures that if
there is a 2nd failure while recovering from 1st failure, the system
start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Matei Zaharia
08e444df0e
Change EC2 script to use 0.6 AMIs by default, for now
2013-02-17 14:01:48 -08:00
Matei Zaharia
2a907dceb3
Merge pull request #421 from shivaram/spark-ec2-change
...
Switch spark_ec2.py to use the new spark-ec2 scripts.
2013-02-17 13:48:43 -08:00
Matei Zaharia
340cc54e47
Merge pull request #471 from stephenh/parallelrdd
...
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia
3260b6120e
Merge pull request #470 from stephenh/morek
...
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman
924f47dd11
Add RDD.subtract.
...
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman
e7713adb99
Move ParallelCollection into spark.rdd package.
2013-02-16 13:20:48 -06:00
Stephen Haberman
ae2234687d
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 13:10:31 -06:00
Matei Zaharia
9d979fb630
Merge pull request #469 from stephenh/samepartitionercombine
...
If combineByKey is using the same partitioner, skip the shuffle.
2013-02-16 10:07:42 -08:00
Stephen Haberman
4328873294
Add assertion about dependencies.
2013-02-16 01:16:40 -06:00
Stephen Haberman
c34b8ad2c5
Avoid a shuffle if combineByKey is passed the same partitioner.
2013-02-16 00:54:03 -06:00
Stephen Haberman
4281e579c2
Update more javadocs.
2013-02-16 00:45:03 -06:00
haitao.yao
858784459f
support customized java options for master, worker, executor, repl shell
2013-02-16 14:42:06 +08:00
Stephen Haberman
6a2d957843
Tweak test names.
2013-02-16 00:33:49 -06:00
Stephen Haberman
37397106ce
Remove fileServerSuite.txt.
2013-02-16 00:31:07 -06:00
Stephen Haberman
6cd68c31cb
Update default.parallelism docs, have StandaloneSchedulerBackend use it.
...
Only brand new RDDs (e.g. parallelize and makeRDD) now use default
parallelism, everything else uses their largest parent's partitioner
or partition size.
2013-02-16 00:29:11 -06:00
Matei Zaharia
beb7ab8708
Merge pull request #467 from squito/executor_job_id
...
include jobid in Executor commandline args
2013-02-15 22:09:24 -08:00
haitao.yao
a9cfac347a
Merge branch 'mesos'
2013-02-16 10:11:28 +08:00
Imran Rashid
bffee929ab
Merge branch 'master' into stageInfo
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/storage/BlockManager.scala
2013-02-15 10:35:04 -08:00
Tathagata Das
ddcb976b0d
Made MasterFailureTest more robust.
2013-02-15 06:54:47 +00:00