Commit graph

2385 commits

Author SHA1 Message Date
Matei Zaharia a3e86b2b1f Merge pull request #483 from rxin/splitpruningrdd2
Added a method to create PartitionPruningRDD.
2013-02-19 17:07:00 -08:00
Andy Konwinski ecd137a72d Fixes link to issue tracker in documentation page "Contributing to Spark". 2013-02-19 16:58:02 -08:00
Reynold Xin 130f704baf Added a method to create PartitionPruningRDD. 2013-02-19 16:03:52 -08:00
Charles Reiss d0588bd6d7 Catch/log errors deleting temp dirs 2013-02-19 13:04:06 -08:00
Charles Reiss 687581c3ec Paranoid uncaught exception handling for exceptions during shutdown 2013-02-19 13:03:02 -08:00
Tathagata Das b0565ae396 Merge pull request #481 from pwendell/stream-rdd-type-streaming
STREAMING-51: Add RDD type as a type parameter in JavaDStreamLike Edit (streaming/ version)
2013-02-19 10:25:36 -08:00
Patrick Wendell 041c19e5f0 Small changes that were missing in merge 2013-02-19 08:44:20 -08:00
Patrick Wendell fed1122d74 Use RDD type for slice operator in Java.
This commit uses the RDD type in `slice`, making it available to both normal
and pair RDD's in java. It also updates the signature for `slice` to match
changes in the Scala API.
2013-02-19 08:32:38 -08:00
Patrick Wendell 35880de42e Use RDD type for transform operator in Java.
This is an improved implementation of the `transform` operator in Java.
The main difference is that this allows all four possible types of
transform functions

1. JavaRDD -> JavaRDD
2. JavaRDD -> JavaPairRDD
3. JavaPairRDD -> JavaPairRDD
4. JavaPairRDD -> JavaRDD

whereas previously only (1) and (3) were possible.

Conflicts:

	streaming/src/test/java/spark/streaming/JavaAPISuite.java
2013-02-19 08:31:58 -08:00
Patrick Wendell 9d49a6b03f Use RDD type for foreach operator in Java. 2013-02-19 08:30:32 -08:00
Nick Pentreath 8a281399f9 Streaming example using Twitter Algebird's Count Min Sketch monoid 2013-02-19 17:56:02 +02:00
Nick Pentreath d8ee184d95 Dependencies and refactoring for streaming HLL example, and using context.twitterStream method 2013-02-19 17:42:57 +02:00
Prashant Sharma 8d44480d84 example for demonstrating ZeroMQ stream 2013-02-19 19:42:14 +05:30
Prashant Sharma f7d3e309cb ZeroMQ stream as receiver 2013-02-19 19:32:52 +05:30
Nick Pentreath 315ea069e8 Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird
Conflicts:
	project/SparkBuild.scala
2013-02-19 13:58:05 +02:00
Nick Pentreath 015893f0e8 Adding streaming HyperLogLog example using Algebird 2013-02-19 13:21:33 +02:00
Tathagata Das 8b9c673fce Merge pull request #476 from tdas/streaming
Major modifications to fix driver fault-tolerance with file input stream
2013-02-19 03:07:10 -08:00
Tathagata Das 7e30c46aaf Added comment to the KafkaWordCount, given by Sean McNamara. 2013-02-19 03:05:44 -08:00
Tathagata Das 7851b34e97 Merge branch 'mesos-streaming' into streaming 2013-02-19 03:01:15 -08:00
Tathagata Das 9e82be1503 Merge branch 'streaming' into ScrapCodes-streaming-actor
Conflicts:
	docs/plugin-custom-receiver.md
	streaming/src/main/scala/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala
	streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala
	streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala
2013-02-19 02:48:50 -08:00
Matei Zaharia 03d847999e Merge pull request #477 from shivaram/ganglia-port-change
Ganglia port change
2013-02-18 20:25:48 -08:00
haitao.yao 7c129388fb Merge branch 'mesos' 2013-02-19 11:22:24 +08:00
Shivaram Venkataraman 6cba5a48b0 Print cluster url after setup completes 2013-02-18 18:30:36 -08:00
Shivaram Venkataraman e7cdf7a6a4 Print ganglia url after setup 2013-02-18 17:15:22 -08:00
Shivaram Venkataraman 03f45a18d5 Use port 5080 for httpd/ganglia 2013-02-18 16:56:01 -08:00
Tathagata Das 12ea14c211 Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver. 2013-02-18 15:18:34 -08:00
Tathagata Das 6a6e6bda57 Merge branch 'streaming' into ScrapCode-streaming
Conflicts:
	streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala
	streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala
2013-02-18 13:26:12 -08:00
Tathagata Das 8ad561dc7d Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs. 2013-02-18 02:12:41 -08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Matei Zaharia 06e5e6627f Renamed "splits" to "partitions" 2013-02-17 22:13:26 -08:00
Matei Zaharia 455d015076 Clean up EC2 script options a bit 2013-02-17 16:53:12 -08:00
Tathagata Das f98c7da23e Many changes to ensure better 2nd recovery if 2nd failure happens while
recovering from 1st failure
- Made the scheduler to checkpoint after clearing old metadata which
  ensures that a new checkpoint is written as soon as at least one batch
  gets computed  while recovering from a failure. This ensures that if
  there is a 2nd failure while recovering from 1st failure, the system
  start 2nd recovery from a newer checkpoint.
- Modified Checkpoint writer to write checkpoint in a different thread.
- Added a check to make sure that compute for InputDStreams gets called
  only for strictly increasing times.
- Changed implementation of slice to call getOrCompute on parent DStream
  in time-increasing order.
- Added testcase to test slice.
- Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify
  results with expected output in an order-independent manner.
2013-02-17 15:06:41 -08:00
Matei Zaharia 08e444df0e Change EC2 script to use 0.6 AMIs by default, for now 2013-02-17 14:01:48 -08:00
Matei Zaharia 2a907dceb3 Merge pull request #421 from shivaram/spark-ec2-change
Switch spark_ec2.py to use the new spark-ec2 scripts.
2013-02-17 13:48:43 -08:00
Matei Zaharia 340cc54e47 Merge pull request #471 from stephenh/parallelrdd
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia 3260b6120e Merge pull request #470 from stephenh/morek
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman 924f47dd11 Add RDD.subtract.
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman e7713adb99 Move ParallelCollection into spark.rdd package. 2013-02-16 13:20:48 -06:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Matei Zaharia 9d979fb630 Merge pull request #469 from stephenh/samepartitionercombine
If combineByKey is using the same partitioner, skip the shuffle.
2013-02-16 10:07:42 -08:00
Stephen Haberman 4328873294 Add assertion about dependencies. 2013-02-16 01:16:40 -06:00
Stephen Haberman c34b8ad2c5 Avoid a shuffle if combineByKey is passed the same partitioner. 2013-02-16 00:54:03 -06:00
Stephen Haberman 4281e579c2 Update more javadocs. 2013-02-16 00:45:03 -06:00
haitao.yao 858784459f support customized java options for master, worker, executor, repl shell 2013-02-16 14:42:06 +08:00
Stephen Haberman 6a2d957843 Tweak test names. 2013-02-16 00:33:49 -06:00
Stephen Haberman 37397106ce Remove fileServerSuite.txt. 2013-02-16 00:31:07 -06:00
Stephen Haberman 6cd68c31cb Update default.parallelism docs, have StandaloneSchedulerBackend use it.
Only brand new RDDs (e.g. parallelize and makeRDD) now use default
parallelism, everything else uses their largest parent's partitioner
or partition size.
2013-02-16 00:29:11 -06:00
Matei Zaharia beb7ab8708 Merge pull request #467 from squito/executor_job_id
include jobid in Executor commandline args
2013-02-15 22:09:24 -08:00
haitao.yao a9cfac347a Merge branch 'mesos' 2013-02-16 10:11:28 +08:00
Tathagata Das ddcb976b0d Made MasterFailureTest more robust. 2013-02-15 06:54:47 +00:00