ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	a3e86b2b1f	Merge pull request #483 from rxin/splitpruningrdd2 Added a method to create PartitionPruningRDD.	2013-02-19 17:07:00 -08:00
Andy Konwinski	ecd137a72d	Fixes link to issue tracker in documentation page "Contributing to Spark".	2013-02-19 16:58:02 -08:00
Reynold Xin	130f704baf	Added a method to create PartitionPruningRDD.	2013-02-19 16:03:52 -08:00
Charles Reiss	d0588bd6d7	Catch/log errors deleting temp dirs	2013-02-19 13:04:06 -08:00
Charles Reiss	687581c3ec	Paranoid uncaught exception handling for exceptions during shutdown	2013-02-19 13:03:02 -08:00
Tathagata Das	b0565ae396	Merge pull request #481 from pwendell/stream-rdd-type-streaming STREAMING-51: Add RDD type as a type parameter in JavaDStreamLike Edit (streaming/ version)	2013-02-19 10:25:36 -08:00
Patrick Wendell	041c19e5f0	Small changes that were missing in merge	2013-02-19 08:44:20 -08:00
Patrick Wendell	fed1122d74	Use RDD type for `slice` operator in Java. This commit uses the RDD type in `slice`, making it available to both normal and pair RDD's in java. It also updates the signature for `slice` to match changes in the Scala API.	2013-02-19 08:32:38 -08:00
Patrick Wendell	35880de42e	Use RDD type for `transform` operator in Java. This is an improved implementation of the `transform` operator in Java. The main difference is that this allows all four possible types of transform functions 1. JavaRDD -> JavaRDD 2. JavaRDD -> JavaPairRDD 3. JavaPairRDD -> JavaPairRDD 4. JavaPairRDD -> JavaRDD whereas previously only (1) and (3) were possible. Conflicts: streaming/src/test/java/spark/streaming/JavaAPISuite.java	2013-02-19 08:31:58 -08:00
Patrick Wendell	9d49a6b03f	Use RDD type for `foreach` operator in Java.	2013-02-19 08:30:32 -08:00
Nick Pentreath	8a281399f9	Streaming example using Twitter Algebird's Count Min Sketch monoid	2013-02-19 17:56:02 +02:00
Nick Pentreath	d8ee184d95	Dependencies and refactoring for streaming HLL example, and using context.twitterStream method	2013-02-19 17:42:57 +02:00
Prashant Sharma	8d44480d84	example for demonstrating ZeroMQ stream	2013-02-19 19:42:14 +05:30
Prashant Sharma	f7d3e309cb	ZeroMQ stream as receiver	2013-02-19 19:32:52 +05:30
Nick Pentreath	315ea069e8	Merge remote-tracking branch 'upstream/streaming' into streaming-eg-algebird Conflicts: project/SparkBuild.scala	2013-02-19 13:58:05 +02:00
Nick Pentreath	015893f0e8	Adding streaming HyperLogLog example using Algebird	2013-02-19 13:21:33 +02:00
Tathagata Das	8b9c673fce	Merge pull request #476 from tdas/streaming Major modifications to fix driver fault-tolerance with file input stream	2013-02-19 03:07:10 -08:00
Tathagata Das	7e30c46aaf	Added comment to the KafkaWordCount, given by Sean McNamara.	2013-02-19 03:05:44 -08:00
Tathagata Das	7851b34e97	Merge branch 'mesos-streaming' into streaming	2013-02-19 03:01:15 -08:00
Tathagata Das	9e82be1503	Merge branch 'streaming' into ScrapCodes-streaming-actor Conflicts: docs/plugin-custom-receiver.md streaming/src/main/scala/spark/streaming/StreamingContext.scala streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/dstream/PluggableInputDStream.scala streaming/src/main/scala/spark/streaming/receivers/ActorReceiver.scala streaming/src/test/scala/spark/streaming/InputStreamsSuite.scala	2013-02-19 02:48:50 -08:00
Matei Zaharia	03d847999e	Merge pull request #477 from shivaram/ganglia-port-change Ganglia port change	2013-02-18 20:25:48 -08:00
haitao.yao	7c129388fb	Merge branch 'mesos'	2013-02-19 11:22:24 +08:00
Shivaram Venkataraman	6cba5a48b0	Print cluster url after setup completes	2013-02-18 18:30:36 -08:00
Shivaram Venkataraman	e7cdf7a6a4	Print ganglia url after setup	2013-02-18 17:15:22 -08:00
Shivaram Venkataraman	03f45a18d5	Use port 5080 for httpd/ganglia	2013-02-18 16:56:01 -08:00
Tathagata Das	12ea14c211	Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.	2013-02-18 15:18:34 -08:00
Tathagata Das	6a6e6bda57	Merge branch 'streaming' into ScrapCode-streaming Conflicts: streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala	2013-02-18 13:26:12 -08:00
Tathagata Das	8ad561dc7d	Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.	2013-02-18 02:12:41 -08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	455d015076	Clean up EC2 script options a bit	2013-02-17 16:53:12 -08:00
Tathagata Das	f98c7da23e	Many changes to ensure better 2nd recovery if 2nd failure happens while recovering from 1st failure - Made the scheduler to checkpoint after clearing old metadata which ensures that a new checkpoint is written as soon as at least one batch gets computed while recovering from a failure. This ensures that if there is a 2nd failure while recovering from 1st failure, the system start 2nd recovery from a newer checkpoint. - Modified Checkpoint writer to write checkpoint in a different thread. - Added a check to make sure that compute for InputDStreams gets called only for strictly increasing times. - Changed implementation of slice to call getOrCompute on parent DStream in time-increasing order. - Added testcase to test slice. - Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify results with expected output in an order-independent manner.	2013-02-17 15:06:41 -08:00
Matei Zaharia	08e444df0e	Change EC2 script to use 0.6 AMIs by default, for now	2013-02-17 14:01:48 -08:00
Matei Zaharia	2a907dceb3	Merge pull request #421 from shivaram/spark-ec2-change Switch spark_ec2.py to use the new spark-ec2 scripts.	2013-02-17 13:48:43 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Matei Zaharia	9d979fb630	Merge pull request #469 from stephenh/samepartitionercombine If combineByKey is using the same partitioner, skip the shuffle.	2013-02-16 10:07:42 -08:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
haitao.yao	858784459f	support customized java options for master, worker, executor, repl shell	2013-02-16 14:42:06 +08:00
Stephen Haberman	6a2d957843	Tweak test names.	2013-02-16 00:33:49 -06:00
Stephen Haberman	37397106ce	Remove fileServerSuite.txt.	2013-02-16 00:31:07 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
Matei Zaharia	beb7ab8708	Merge pull request #467 from squito/executor_job_id include jobid in Executor commandline args	2013-02-15 22:09:24 -08:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Tathagata Das	ddcb976b0d	Made MasterFailureTest more robust.	2013-02-15 06:54:47 +00:00

1 2 3 4 5 ...

2385 commits