ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Shivaram Venkataraman	03f45a18d5	Use port 5080 for httpd/ganglia	2013-02-18 16:56:01 -08:00
Tathagata Das	12ea14c211	Changed networkStream to socketStream and pluggableNetworkStream to become networkStream as a way to create streams from arbitrary network receiver.	2013-02-18 15:18:34 -08:00
Tathagata Das	6a6e6bda57	Merge branch 'streaming' into ScrapCode-streaming Conflicts: streaming/src/main/scala/spark/streaming/dstream/KafkaInputDStream.scala streaming/src/main/scala/spark/streaming/dstream/NetworkInputDStream.scala	2013-02-18 13:26:12 -08:00
Tathagata Das	8ad561dc7d	Added checkpointing and fault-tolerance semantics to the programming guide. Fixed default checkpoint interval to being a multiple of slide duration. Fixed visibility of some classes and objects to clean up docs.	2013-02-18 02:12:41 -08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	455d015076	Clean up EC2 script options a bit	2013-02-17 16:53:12 -08:00
Tathagata Das	f98c7da23e	Many changes to ensure better 2nd recovery if 2nd failure happens while recovering from 1st failure - Made the scheduler to checkpoint after clearing old metadata which ensures that a new checkpoint is written as soon as at least one batch gets computed while recovering from a failure. This ensures that if there is a 2nd failure while recovering from 1st failure, the system start 2nd recovery from a newer checkpoint. - Modified Checkpoint writer to write checkpoint in a different thread. - Added a check to make sure that compute for InputDStreams gets called only for strictly increasing times. - Changed implementation of slice to call getOrCompute on parent DStream in time-increasing order. - Added testcase to test slice. - Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify results with expected output in an order-independent manner.	2013-02-17 15:06:41 -08:00
Matei Zaharia	08e444df0e	Change EC2 script to use 0.6 AMIs by default, for now	2013-02-17 14:01:48 -08:00
Matei Zaharia	2a907dceb3	Merge pull request #421 from shivaram/spark-ec2-change Switch spark_ec2.py to use the new spark-ec2 scripts.	2013-02-17 13:48:43 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Matei Zaharia	9d979fb630	Merge pull request #469 from stephenh/samepartitionercombine If combineByKey is using the same partitioner, skip the shuffle.	2013-02-16 10:07:42 -08:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
haitao.yao	858784459f	support customized java options for master, worker, executor, repl shell	2013-02-16 14:42:06 +08:00
Stephen Haberman	6a2d957843	Tweak test names.	2013-02-16 00:33:49 -06:00
Stephen Haberman	37397106ce	Remove fileServerSuite.txt.	2013-02-16 00:31:07 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
Matei Zaharia	beb7ab8708	Merge pull request #467 from squito/executor_job_id include jobid in Executor commandline args	2013-02-15 22:09:24 -08:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Tathagata Das	ddcb976b0d	Made MasterFailureTest more robust.	2013-02-15 06:54:47 +00:00
Tathagata Das	3bcc6e5c03	Merge pull request #466 from pwendell/java-stream-transform STREAMING-50: Support transform workaround in JavaPairDStream	2013-02-14 21:30:55 -08:00
Tathagata Das	4b8402e900	Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.	2013-02-14 18:10:37 -08:00
Tathagata Das	def8126d77	Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.	2013-02-14 17:49:43 -08:00
Tathagata Das	2eacf22401	Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.	2013-02-14 12:21:47 -08:00
Tathagata Das	03e8dc6861	Changes functions comments to make them more consistent.	2013-02-13 20:59:29 -08:00
Tathagata Das	12b020b668	Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.	2013-02-13 20:53:50 -08:00
Imran Rashid	893bad9089	use appid instead of frameworkid; simplify stupid condition	2013-02-13 20:30:21 -08:00
Matei Zaharia	e8663e0fe5	Merge pull request #461 from JoshRosen/fix/issue-tracker-link Update issue tracker link in contributing guide	2013-02-13 18:42:17 -08:00
Imran Rashid	8f18e7e863	include jobid in Executor commandline args	2013-02-13 13:05:13 -08:00
Tathagata Das	39addd3803	Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.	2013-02-13 12:17:45 -08:00
Patrick Wendell	3f3e77f28b	STREAMING-50: Support transform workaround in JavaPairDStream This ports a useful workaround (the `transform` function) to JavaPairDStream. It is necessary to do things like sorting which are not supported yet in the core streaming API.	2013-02-12 14:02:32 -08:00
Matei Zaharia	fd7e414bd0	Merge pull request #464 from pwendell/java-type-fix SPARK-694: All references to [K, V] in JavaDStreamLike should be changed to [K2, V2]	2013-02-11 19:19:05 -08:00
Matei Zaharia	bfeed4725d	Merge pull request #465 from pwendell/java-sort-fix SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 18:23:12 -08:00
Patrick Wendell	21df6ffc13	SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 17:43:26 -08:00
Matei Zaharia	582d31dff9	Formatting fixes	2013-02-11 13:24:54 -08:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Josh Rosen	e9fb25426e	Remove hack workaround for SPARK-668. Renaming the type paramters solves this problem (see SPARK-694). I tried this fix earlier, but it didn't work because I didn't run `sbt/sbt clean` first.	2013-02-11 11:19:20 -08:00
Patrick Wendell	d09c36065c	Using tuple swap()	2013-02-11 10:45:45 -08:00
Patrick Wendell	04786d0739	small fix	2013-02-11 10:05:49 -08:00
Patrick Wendell	c65988bdc1	Fix for MapPartitions	2013-02-11 10:03:37 -08:00
Patrick Wendell	20cf770545	Fix for flatmap	2013-02-11 10:03:37 -08:00
Patrick Wendell	314d87a038	Indentation fix	2013-02-11 10:03:37 -08:00
Patrick Wendell	f0b68c623c	Initial cut at replacing K, V in Java files	2013-02-11 10:03:37 -08:00

... 5 6 7 8 9 ...

2583 commits