ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	03d0b858c8	Made use of spark.executor.memory setting consistent and documented it Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-06-30 15:46:46 -07:00
Patrick Wendell	e721ff7e5a	Allowing details for failed stages	2013-06-29 11:26:30 -07:00
Patrick Wendell	473961d82e	Styling for progress bar	2013-06-29 08:38:04 -07:00
Patrick Wendell	249f0e54ba	Minor changes from Matei's review	2013-06-28 13:25:26 -07:00
Patrick Wendell	c537e869f3	Missing logo file	2013-06-27 22:02:03 -07:00
Patrick Wendell	62c2c6b856	Forcing Jetty to run as daemon	2013-06-27 21:47:22 -07:00
Patrick Wendell	a55190d314	Adding better tabs for UI headers.	2013-06-27 19:14:51 -07:00
Patrick Wendell	362d996c81	Handful of changes based on matei's review - Avoid exception when no tasks have finished for a stage - Adding DOCTYPE so css renders properly - Adding progress slider	2013-06-27 19:14:28 -07:00
Patrick Wendell	92a4c2a5f6	Fixing bug in local scheduler time recording	2013-06-27 12:33:06 -07:00
Stephen Haberman	d7011632d1	Wrap lines.	2013-06-26 12:35:57 -05:00
Patrick Wendell	ee692482a6	One more private class	2013-06-26 09:07:32 -07:00
Patrick Wendell	a59c15a37e	Adding config option for retained stages	2013-06-26 08:54:57 -07:00
Patrick Wendell	274193664a	Bumping timeouts	2013-06-26 08:51:28 -07:00
Patrick Wendell	b14ad509ba	Moving static ui package	2013-06-26 08:46:51 -07:00
Patrick Wendell	2cbaa0734b	Making all new classes package private	2013-06-26 08:44:55 -07:00
Stephen Haberman	d11025dc6a	Be cute with Option and getenv.	2013-06-26 09:53:35 -05:00
Matei Zaharia	6c8d1b2ca6	Fix computation of classpath when we launch java directly The previous version assumed that a CLASSPATH environment variable was set by the "run" script when launching the process that starts the ExecutorRunner, but unfortunately this is not true in tests. Instead, we factor the classpath calculation into an extenral script and call that. NOTE: This includes a Windows version but hasn't yet been tested there.	2013-06-25 18:21:00 -04:00
Matei Zaharia	15b00914c5	Some fixes to the launch-java-directly change: - Split SPARK_JAVA_OPTS into multiple command-line arguments if it contains spaces; this splitting follows quoting rules in bash - Add the Scala JARs to the classpath if they're not in the CLASSPATH variable because the ExecutorRunner is launched with "scala" (this can happen when using local-cluster URLs in spark-shell)	2013-06-25 17:17:27 -04:00
Matei Zaharia	7e0191c6ea	Merge remote-tracking branch 'cgrothaus/SPARK-698' Conflicts: run	2013-06-25 15:47:40 -04:00
Patrick Wendell	d66bd6f885	Adding another unit test to Web UI suite	2013-06-24 17:12:55 -07:00
Patrick Wendell	f7389330c3	Allowing for requested port on construction	2013-06-24 16:51:52 -07:00
Patrick Wendell	42157027f2	A few bug fixes and a unit test	2013-06-24 16:25:05 -07:00
Patrick Wendell	a4248138b4	Minor style cleanup	2013-06-24 14:22:28 -07:00
Patrick Wendell	b5e6e8bcc8	Cleaning up some code for Job Progress	2013-06-24 14:13:24 -07:00
Patrick Wendell	93e8ed85aa	Work around for initalization issue	2013-06-24 13:11:18 -07:00
Patrick Wendell	f6e64b5cd6	Updating based on changes to JobLogger (and one small change to JobLogger)	2013-06-24 12:40:41 -07:00
Matei Zaharia	78ffe164b3	Clone the zero value for each key in foldByKey The old version reused the object within each task, leading to overwriting of the object when a mutable type is used, which is expected to be common in fold. Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2013-06-23 10:26:53 -07:00
Matei Zaharia	0e0f9d3069	Fix search path for REPL class loader to really find added JARs	2013-06-22 17:44:04 -07:00
Matei Zaharia	3e61beff7b	Merge pull request #648 from shivaram/netty-dbg Shuffle fixes and cleanup	2013-06-22 16:22:47 -07:00
Patrick Wendell	7e9f1ed0de	Some cleanup of styling	2013-06-22 10:31:37 -07:00
Patrick Wendell	3b7ebdeeb8	Handling entirely failed stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	be6107ce44	Some tweaking with shared page header	2013-06-22 10:31:37 -07:00
Patrick Wendell	9a24d1a2d0	Using scala in XML imports	2013-06-22 10:31:37 -07:00
Patrick Wendell	f91e1c4822	Linking RDD information when available in stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	a86bb459e2	Showing shuffle status and purging old stages	2013-06-22 10:31:37 -07:00
Patrick Wendell	3485e73376	Style cleanup	2013-06-22 10:31:37 -07:00
Patrick Wendell	dd696f3a3d	Some renaming and comments	2013-06-22 10:31:37 -07:00
Patrick Wendell	5c872e9ef5	Documentation and some refactoring	2013-06-22 10:31:37 -07:00
Patrick Wendell	17776323a6	More work on percentile data:	2013-06-22 10:31:37 -07:00
Patrick Wendell	dcf6a68177	Refactoring into different modules	2013-06-22 10:31:36 -07:00
Patrick Wendell	ce81c320ac	Adding helper function to make listing tables	2013-06-22 10:31:36 -07:00
Patrick Wendell	9fd5dc3ea9	Initial steps towards job progress UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	bc4a811c57	Stash	2013-06-22 10:31:36 -07:00
Patrick Wendell	77c53f7868	Refactoring UI packages	2013-06-22 10:31:36 -07:00
Patrick Wendell	8b5c7e71c4	Import cleanup	2013-06-22 10:31:36 -07:00
Patrick Wendell	32a45d01b1	Removing twirl files	2013-06-22 10:31:36 -07:00
Patrick Wendell	4e1f202481	Removing dead code	2013-06-22 10:31:36 -07:00
Patrick Wendell	d6fde4ffe4	Some JSON cleanup	2013-06-22 10:31:36 -07:00
Patrick Wendell	91ec5a1a04	Changing JSON protocol and removing spray code	2013-06-22 10:31:36 -07:00
Patrick Wendell	fc94576ece	Adding worker version of UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	ee73c09ac9	Some comments	2013-06-22 10:31:36 -07:00
Patrick Wendell	9161db5478	Cleaning up master web UI	2013-06-22 10:31:36 -07:00
Patrick Wendell	e55cf0245f	Adding WebUI file	2013-06-22 10:31:35 -07:00
Patrick Wendell	f85fd7a793	Commenting unfinished part	2013-06-22 10:31:35 -07:00
Patrick Wendell	2c36a514aa	Spray refactoring for master web UI	2013-06-22 10:31:35 -07:00
Patrick Wendell	7e6977b6c5	Fix in storage status page	2013-06-22 10:31:35 -07:00
Patrick Wendell	950f83535a	Adding deterministic port	2013-06-22 10:31:35 -07:00
Patrick Wendell	7cd70dc2c1	Minor cleanup	2013-06-22 10:31:35 -07:00
Patrick Wendell	e66f570194	Completely hacked version of block manager UI in jetty	2013-06-22 10:31:35 -07:00
Patrick Wendell	60fbf7e461	Partially working checkpoint	2013-06-22 10:31:35 -07:00
Matei Zaharia	1ef5d0d2c9	Merge pull request #644 from shimingfei/joblogger add Joblogger to Spark (on new Spark code)	2013-06-22 09:35:57 -07:00
Jey Kottalam	1ba3c17303	use parens when calling method with side-effects	2013-06-21 12:14:16 -04:00
Jey Kottalam	edb18ca928	Rename PythonWorker to PythonWorkerFactory	2013-06-21 12:14:16 -04:00
Jey Kottalam	62c4781400	Add tests and fixes for Python daemon shutdown	2013-06-21 12:14:16 -04:00
Jey Kottalam	c79a6078c3	Prefork Python worker processes	2013-06-21 12:14:16 -04:00
Jey Kottalam	40afe0d2a5	Add Python timing instrumentation	2013-06-21 12:14:16 -04:00
Mingfei	2fc794a6c7	small modify in DAGScheduler	2013-06-21 18:21:35 +08:00
Mingfei	4b9862ac9c	small format modification	2013-06-21 17:55:32 +08:00
Mingfei	aa7aa587be	some format modification	2013-06-21 17:48:41 +08:00
Mingfei	5240795154	edit according to comments	2013-06-21 17:38:23 +08:00
Matei Zaharia	71030ba3eb	Merge pull request #654 from lyogavin/enhance_pipe fix typo and coding style in #638	2013-06-19 15:21:03 -07:00
Thomas Graves	bad51c7cb4	upmerge with latest mesos/spark master and fix hbase compile with hadoop2-yarn profile	2013-06-19 14:39:13 -05:00
Thomas Graves	75d78c7ac9	Add support for Spark on Yarn on a secure Hadoop cluster	2013-06-19 11:18:42 -05:00
Matei Zaharia	7902baddc7	Update ASM to version 4.0	2013-06-19 13:34:30 +02:00
Gavin Li	0a2a9bce1e	fix typo and coding style	2013-06-18 21:30:13 +00:00
jerryshao	1e9269c3ee	reduce ZippedPartitionsRDD's getPreferredLocations complexity	2013-06-18 09:49:06 +08:00
Matei Zaharia	db42451a52	Merge pull request #643 from adatao/master Bug fix: Zero-length partitions result in NaN for overall mean & variance	2013-06-17 15:26:36 -07:00
Matei Zaharia	e82a2ffcc9	Merge pull request #653 from rxin/logging SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory."	2013-06-17 15:13:15 -07:00
Matei Zaharia	ec193c7d89	Merge remote-tracking branch 'xiajunluan/xiajunluan' Conflicts: core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala	2013-06-18 00:11:50 +02:00
Reynold Xin	be3c406edf	Fixed the typo pointed out by Matei.	2013-06-17 17:07:51 -04:00
Reynold Xin	1450296797	SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory".	2013-06-17 16:58:23 -04:00
Gavin Li	4508089fc3	refine comments and add sc.clean	2013-06-17 05:23:46 +00:00
Gavin Li	e6ae049283	Merge remote-tracking branch 'upstream1/master' into enhance_pipe	2013-06-16 22:53:39 +00:00
Gavin Li	fb6d733fa8	update according to comments	2013-06-16 22:32:55 +00:00
Matei Zaharia	f961aac8b2	Merge pull request #649 from ryanlecompte/master Add top K method to RDD using a bounded priority queue	2013-06-15 00:53:41 -07:00
ryanlecompte	e8801d4490	use delegation for BoundedPriorityQueue, add Java API	2013-06-14 23:39:05 -07:00
Reynold Xin	2cc188fd54	SPARK-774: cogroup should also disable map side combine by default	2013-06-14 00:10:54 -07:00
Reynold Xin	6738178d0d	SPARK-772: groupByKey should disable map side combine.	2013-06-13 23:59:42 -07:00
ryanlecompte	93b3f5e535	drop unneeded ClassManifest implicit	2013-06-13 16:26:35 -07:00
ryanlecompte	44b8dbaede	use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs	2013-06-13 16:23:15 -07:00
Shivaram Venkataraman	1d9f0df065	Fix some comments and style	2013-06-13 14:46:25 -07:00
Mingfei	967a6a699d	modify sparklister function interface according to comments	2013-06-13 14:36:07 +08:00
Shivaram Venkataraman	5da4287b1d	Merge branch 'netty-dbg' of github.com:shivaram/spark into netty-dbg	2013-06-12 16:38:37 -07:00
Shivaram Venkataraman	5e9a9317c5	Merge branch 'master' of git://github.com/mesos/spark into netty-dbg	2013-06-12 16:38:01 -07:00
ryanlecompte	db5bca08ff	add a new top K method to RDD using a bounded priority queue	2013-06-12 10:54:16 -07:00
Andrew xia	190ec61799	change code style and debug info	2013-06-10 15:27:02 +08:00
Patrick Wendell	ef14dc2e77	Adding Java-API version of compression codec	2013-06-09 18:09:46 -07:00
Patrick Wendell	df592192e7	Monads FTW	2013-06-09 18:09:24 -07:00
Patrick Wendell	d1bbcebae5	Adding compression to Hadoop save functions	2013-06-09 11:39:35 -07:00
Mingfei	ade822011d	not check return value of eventQueue.take	2013-06-08 16:26:45 +08:00
Mingfei	4fd86e0e10	delete test code for joblogger in SparkContext	2013-06-08 15:45:47 +08:00
Mingfei	362f0f93ac	Merge branch 'master' of https://github.com/mesos/spark	2013-06-08 15:20:13 +08:00
Mingfei	1a4d93c025	modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work	2013-06-08 14:23:39 +08:00
Matei Zaharia	b58a29295b	Small formatting and style fixes	2013-06-07 22:51:28 -07:00
Matei Zaharia	c8fc423bc2	Merge pull request #631 from jerryshao/master Fix block manager UI display issue when enable spark.cleaner.ttl	2013-06-07 22:43:18 -07:00
Matei Zaharia	c9ca0a4a58	Small code style fix to SchedulingAlgorithm.scala	2013-06-07 22:40:44 -07:00
Matei Zaharia	1ae60bcb36	Merge pull request #634 from xiajunluan/master [Spark-753] Fix ClusterSchedulSuite unit test failed	2013-06-07 22:39:06 -07:00
Shivaram Venkataraman	ac480fd977	Clean up variables and counters in BlockFetcherIterator	2013-06-06 16:34:27 -07:00
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Shivaram Venkataraman	cb2f5046ee	Pass in bufferSize to BufferedOutputStream	2013-06-05 15:09:02 -07:00
Shivaram Venkataraman	c851957fe4	Don't write zero block files with java serializer	2013-06-05 14:28:38 -07:00
Christopher Nguyen	9d35904357	In the current code, when both partitions happen to have zero-length, the return mean will be NaN. Consequently, the result of mean after reducing over all partitions will also be NaN, which is not correct if there are partitions with non-zero length. This patch fixes this issue.	2013-06-04 22:12:47 -07:00
Matei Zaharia	fff3728552	Merge pull request #640 from pwendell/timeout-update Fixing bug in BlockManager timeout	2013-06-04 16:09:50 -07:00
Patrick Wendell	061fd3ae36	Fixing bug in BlockManager timeout	2013-06-04 19:02:44 -04:00
Matei Zaharia	f420d4f228	Merge pull request #639 from pwendell/timeout-update Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 15:25:58 -07:00
Patrick Wendell	8bd4e12104	Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 18:14:24 -04:00
Shivaram Venkataraman	96943a1cc0	var to val	2013-06-03 12:29:38 -07:00
Shivaram Venkataraman	cd347f547a	Reuse the file object as it is valid after delete	2013-06-03 12:27:51 -07:00
Shivaram Venkataraman	a058b0acf3	Delete a file for a block if it already exists.	2013-06-03 12:10:00 -07:00
Andrew xia	606bb1b450	Fix schedulingAlgorithm bugs for unit test	2013-06-03 10:29:23 +08:00
Shivaram Venkataraman	038cfc1a9a	Make connect timeout configurable	2013-05-31 23:32:18 -07:00
Shivaram Venkataraman	91aca92249	Another round of Netty fixes. 1. Avoid race condition between stop and copier completion 2. Handle socket exceptions by reporting them and filling in a failed FetchResult	2013-05-31 23:21:38 -07:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Reynold Xin	de1167bf2c	Incorporated Charles' feedback to put rdd metadata removal in BlockManagerMasterActor.	2013-05-31 15:54:57 -07:00
Reynold Xin	ba5e544461	More block manager cleanup. Implemented a removeRdd method in BlockManager, and use that to implement RDD.unpersist. Previously, unpersist needs to send B akka messages, where B = number of blocks. Now unpersist only needs to send W akka messages, where W = the number of workers.	2013-05-31 01:48:16 -07:00
jerryshao	926f41cc52	fix block manager UI display issue when enable spark.cleaner.ttl	2013-05-31 09:32:52 +08:00
Reynold Xin	bed1b08169	Do not create symlink for local add file. Instead, copy the file. This prevents Spark from changing the original file's permission, and also allow add file to work on non-posix operating systems.	2013-05-30 16:21:49 -07:00
Shivaram Venkataraman	3b0cd17343	Merge branch 'master' of git://github.com/mesos/spark Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2013-05-30 14:36:24 -07:00
Andrew xia	c3db3ea554	1. Add unit test for local scheduler 2. Move localTaskSetManager to a new file	2013-05-30 20:49:40 +08:00
Andrew xia	ecceb101d3	implement FIFO and fair scheduler for spark local mode	2013-05-30 10:43:01 +08:00
Shivaram Venkataraman	19fd6d54c0	Also flush serializer in revertPartialWrites	2013-05-29 17:29:34 -07:00
Shivaram Venkataraman	618c8cae1e	Skip fetching zero-sized blocks in OIO. Also unify splitLocalRemoteBlocks for netty/nio and add a test case	2013-05-29 13:18:54 -07:00
Matei Zaharia	6ed71390d9	Merge pull request #626 from stephenh/remove-add-if-no-port Remove unused addIfNoPort.	2013-05-29 10:14:22 -07:00
Shivaram Venkataraman	b79b10a6d6	Flush serializer to fix zero-size kryo blocks bug. Also convert the local-cluster test case to check for non-zero block sizes	2013-05-29 00:52:55 -07:00
Matei Zaharia	41d230ccb0	Merge pull request #611 from squito/classloader Use default classloaders for akka & deserializing task results	2013-05-28 23:35:24 -07:00
Shivaram Venkataraman	fbc1ab3468	Couple of Netty fixes a. Fix the port number by reading it from the bound channel b. Fix the shutdown sequence to make sure we actually block on the channel c. Fix the unit test to use two JVMs.	2013-05-28 16:27:16 -07:00
Stephen Haberman	4fe1fbdd51	Remove unused addIfNoPort.	2013-05-28 16:26:32 -05:00
Matei Zaharia	3db1e17baa	Merge pull request #620 from jerryshao/master Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations	2013-05-27 21:31:43 -07:00
Matei Zaharia	e8d4b6c296	Merge pull request #529 from xiajunluan/master [SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler	2013-05-25 21:09:03 -07:00
Reynold Xin	26962c9340	Automatically configure Netty port. This makes unit tests using local-cluster pass. Previously they were failing because Netty was trying to bind to the same port for all processes. Pair programmed with @shivaram.	2013-05-24 16:39:33 -07:00
Reynold Xin	6ea085169d	Fixed the bug that shuffle serializer is ignored by the new shuffle block iterators for local blocks. Also added a unit test for that.	2013-05-24 14:08:37 -07:00
jerryshao	bd3ea8f2a6	fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException	2013-05-24 14:26:19 +08:00
Charles Reiss	f350f14084	Use ARRAY_SAMPLE_SIZE constant instead of 100.0	2013-05-21 18:11:33 -07:00
Andrew xia	ecd6d75c6a	fix bug of unit tests	2013-05-21 06:49:23 +08:00
Reynold Xin	5912cc4967	Merge pull request #610 from JoshRosen/spark-747 Throw exception if TaskResult exceeds Akka frame size	2013-05-17 19:58:40 -07:00
Reynold Xin	8d78c5f89f	Changed the logging level from info to warning when addJar(null) is called.	2013-05-17 18:51:35 -07:00
Andrew xia	3d4672eaa9	Merge branch 'master' into xiajunluan Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala	2013-05-18 07:28:03 +08:00
Andrew xia	d19753b9c7	expose TaskSetManager type to resourceOffer function in ClusterScheduler	2013-05-18 06:45:19 +08:00
Andrew xia	c6e2770bfe	Fix ClusterScheduler bug to avoid allocating tasks to same slave	2013-05-17 05:10:38 +08:00
Mridul Muralidharan	f0881f8d48	Hope this does not turn into a bike shed change	2013-05-17 01:58:50 +05:30
Mridul Muralidharan	feddd2530d	Filter out nulls - prevent NPE	2013-05-16 17:49:14 +05:30
Josh Rosen	b8e46b6074	Abort job if result exceeds Akka frame size; add test.	2013-05-16 01:57:57 -07:00
Matei Zaharia	2f576aba8f	Merge pull request #602 from rxin/shufflemerge Manual merge & cleanup of Shane's Shuffle Performance Optimization	2013-05-15 18:06:24 -07:00
Reynold Xin	203d7b7c14	Merge pull request #593 from squito/driver_ui_link Master UI has link to Application UI	2013-05-15 00:47:20 -07:00
Reynold Xin	f3491cb89b	Merge branch 'master' of github.com:mesos/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/test/scala/spark/DistributedSuite.scala project/SparkBuild.scala	2013-05-15 00:31:52 -07:00
Reynold Xin	f9d40a5848	Added a comment in JdbcRDD for example usage.	2013-05-14 23:29:57 -07:00
Reynold Xin	81ad2fa331	Merge branch 'jdbc' of github.com:koeninger/spark Conflicts: project/SparkBuild.scala	2013-05-14 23:12:00 -07:00
Imran Rashid	38d4b97c6d	use threads classloader when deserializing task results; classnotfoundexception includes classloader	2013-05-14 22:32:14 -07:00
Imran Rashid	d7d1da79d3	when akka starts, use akkas default classloader (current thread)	2013-05-14 22:32:09 -07:00
Matei Zaharia	016ac86830	Merge pull request #601 from rxin/emptyrdd-master EmptyRDD (master branch 0.8)	2013-05-13 21:45:36 -07:00
Matei Zaharia	4b354e0a08	Merge pull request #589 from mridulm/master Add support for instance local scheduling	2013-05-13 17:39:19 -07:00
Patrick Wendell	7f0833647b	Capturing class name	2013-05-12 07:54:03 -07:00
Patrick Wendell	72b9c4cb6e	Small fix	2013-05-11 23:53:50 -07:00
Patrick Wendell	1c15b85051	Removing import	2013-05-11 23:52:53 -07:00
Patrick Wendell	059ab88754	Changing technique to use same code path in all cases	2013-05-11 23:50:54 -07:00
Cody Koeninger	3da2305ed0	code cleanup per rxin comments	2013-05-11 23:59:07 -05:00
Josh Rosen	440719109e	Throw exception if task result exceeds Akka frame size. This partially addresses SPARK-747.	2013-05-11 19:17:13 -07:00
Patrick Wendell	0345954530	SPARK-738: Spark should detect and squash nonserializable exceptions	2013-05-11 14:17:09 -07:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Imran Rashid	0ab818d508	fix linebreak	2013-05-09 00:38:59 -07:00
Reynold Xin	5d70ee4663	Cleaned up connection manager (moved many classes to their own files).	2013-05-07 22:42:15 -07:00
Reynold Xin	8388e8dd7a	Minor style fix in DiskStore...	2013-05-07 18:40:35 -07:00
Reynold Xin	547dcbe494	Cleaned up Scala files in network/netty from Shane's PR.	2013-05-07 18:39:33 -07:00
Reynold Xin	9e64396ca4	Cleaned up the Java files from Shane's PR.	2013-05-07 18:30:54 -07:00
Reynold Xin	0e5cc30868	Cleaned up BlockManager and BlockFetcherIterator from Shane's PR.	2013-05-07 18:18:24 -07:00
Reynold Xin	8b79485171	Moved BlockFetcherIterator to its own file.	2013-05-07 17:02:32 -07:00
Reynold Xin	90577ada69	Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/DiskStore.scala project/SparkBuild.scala	2013-05-07 15:56:19 -07:00
Reynold Xin	0fd84965f6	Added EmptyRDD.	2013-05-06 15:40:34 -07:00
Imran Rashid	22a5063ae4	switch from separating appUI host & port to combining into just appUiUrl	2013-05-05 12:19:11 -07:00
Matei Zaharia	7af92f248b	Merge pull request #597 from JoshRosen/webui-fixes Two minor bug fixes for Spark Web UI	2013-05-04 22:29:17 -07:00
Josh Rosen	42b1953c53	Fix SPARK-630: app details page shows finished executors as running.	2013-05-04 18:34:47 -07:00
Josh Rosen	c0688451a6	Fix wrong closing tags in web UI HTML.	2013-05-04 18:34:46 -07:00
Josh Rosen	d48e9fde01	Fix SPARK-629: weird number of cores in job details page.	2013-05-04 18:34:45 -07:00
Mridul Muralidharan	25198d7e9e	Merge branch 'master' of github.com:mridulm/spark	2013-05-04 20:45:56 +05:30
Mridul Muralidharan	5b011d18d7	Merge from master	2013-05-04 20:41:27 +05:30
Mridul Muralidharan	edb57c8331	Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach	2013-05-04 19:47:45 +05:30
Matei Zaharia	3bf2c868c3	Merge pull request #594 from shivaram/master Add zip partitions to Java API	2013-05-03 18:27:30 -07:00
Shivaram Venkataraman	bb8a434f9d	Add zipPartitions to Java API.	2013-05-03 15:14:02 -07:00
Imran Rashid	6fae936088	applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui	2013-05-03 12:59:10 -07:00
Mridul Muralidharan	ea2a6f91d3	pull from master	2013-05-04 00:35:59 +05:30
Reynold Xin	93091f6936	Merge branch 'master' of github.com:mesos/spark into blockmanager	2013-05-03 01:02:32 -07:00
Reynold Xin	2bc895a829	Updated according to Matei's code review comment.	2013-05-03 01:02:16 -07:00
Mridul Muralidharan	11589c39d9	Fix ZippedRDD as part Matei's suggestion	2013-05-03 12:23:30 +05:30
Matei Zaharia	6fe9d4e61e	Merge pull request #592 from woggling/localdir-fix Don't accept generated local directory names that can't be created	2013-05-02 21:33:56 -07:00
Matei Zaharia	538ee755b4	Merge pull request #581 from jerryshao/master fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming	2013-05-02 09:01:42 -07:00
Charles Reiss	c847dd3da2	Don't accept generated temp directory names that can't be created successfully.	2013-05-01 23:19:10 -07:00
Reynold Xin	4a31877408	Added the unpersist api to JavaRDD.	2013-05-01 20:31:54 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Mridul Muralidharan	dfde9ce9dd	comment out debug versions of checkHost, etc from Utils - which were used to test	2013-05-02 07:41:33 +05:30
Mridul Muralidharan	1b5aaeadc7	Integrate review comments 2	2013-05-02 07:30:06 +05:30
jerryshao	c047f0e3ad	filter out Spark streaming block RDD and sort RDDInfo with id	2013-05-02 09:48:32 +08:00
Mridul Muralidharan	609a817f52	Integrate review comments on pull request	2013-05-02 06:44:33 +05:30
Reynold Xin	204eb32e14	Changed the type of the persistentRdds hashmap back to TimeStampedHashMap.	2013-05-01 16:14:58 -07:00
Reynold Xin	34637b97ec	Added SparkContext.cleanup back. Not sure why it was removed before ...	2013-05-01 16:12:37 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
harshars	8481562731	Merged Ram's commit on removing RDDs. Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-05-01 14:42:17 -07:00
Mridul Muralidharan	27764a00f4	Fix some npe introduced accidentally	2013-05-01 20:56:05 +05:30
Mridul Muralidharan	d960e7e0f8	a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling. b) Add some fixes to test code to ensure it passes (and fixes some other issues). c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.	2013-05-01 20:24:00 +05:30
Matei Zaharia	aa8fe1a209	Merge pull request #586 from mridulm/master Pull request to address issues Reynold Xin reported	2013-04-30 22:30:18 -07:00
Reynold Xin	dd7bef3147	Two minor fixes according to Ryan LeCompte's review.	2013-04-30 15:02:32 -07:00
Reynold Xin	cea6174573	Merge branch 'master' of github.com:mesos/spark into blockmanager Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala	2013-04-30 13:28:35 -07:00
Mridul Muralidharan	60cabb35cb	Add addition catch block for exception too	2013-05-01 01:17:14 +05:30
Mridul Muralidharan	3b748ced22	Be more aggressive and defensive in all uses of SelectionKey in select loop	2013-05-01 00:30:30 +05:30
Mridul Muralidharan	0f45477be1	Change indentation	2013-05-01 00:10:02 +05:30
Mridul Muralidharan	538614acfe	Be more aggressive and defensive in select also	2013-05-01 00:05:32 +05:30
Mridul Muralidharan	48854e1dbf	If key is not valid, close connection	2013-04-30 23:59:33 +05:30
Matei Zaharia	f708dda81e	Merge pull request #585 from pwendell/listener-perf [Fix SPARK-742] Task Metrics should not employ per-record timing by default	2013-04-30 07:51:40 -07:00
Mridul Muralidharan	e46d547ccd	Fix issues reported by Reynold	2013-04-30 16:15:56 +05:30
Reynold Xin	1055785a83	Allow specifying the shuffle write file buffer size. The default buffer size is 8KB in FastBufferedOutputStream, which is too small and would cause a lot of disk seeks.	2013-04-29 23:33:56 -07:00
Reynold Xin	7007201201	Added a shuffle block manager so it is easier in the future to consolidate shuffle output files.	2013-04-29 23:07:03 -07:00
Reynold Xin	d3586ef438	Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager Conflicts: core/src/main/scala/spark/storage/DiskStore.scala	2013-04-29 15:44:18 -07:00
Patrick Wendell	016ce1fa9c	Using full package name for util	2013-04-29 12:02:27 -07:00
Patrick Wendell	540be6b154	Modified version of the fix which just removes all per-record tracking.	2013-04-29 11:32:07 -07:00
Patrick Wendell	224fbac061	Spark-742: TaskMetrics should not employ per-record timing. This patch does three things: 1. Makes TimedIterator a trait with two implementations (one a no-op) 2. Makes the default behavior to use the no-op implementation 3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like the triat doesn't really reduce complexity in any way. In the future we can add other implementations, e.g. ones which perform sampling.	2013-04-29 11:13:43 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	6e84635ab9	Rename classes from MapZipped* to Zipped*	2013-04-28 15:58:40 -07:00
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6e6b5204ea	Create an empty directory when checkpointing a 0-partition RDD (fixes a test failure on Hadoop 2.0)	2013-04-25 00:42:37 -07:00
Reynold Xin	ba6ffa6a5f	Allow the specification of a shuffle serializer in the read path (for local block reads).	2013-04-24 17:38:07 -07:00
Reynold Xin	aa618ed2a2	Allow changing the serializer on a per shuffle basis.	2013-04-24 14:52:49 -07:00
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Reynold Xin	31ce6c66d6	Added a BlockObjectWriter interface in block manager so ShuffleMapTask doesn't need to build up an array buffer for each shuffle bucket.	2013-04-23 17:48:59 -07:00
koeninger	dfac0aa5c2	prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.	2013-04-22 21:12:52 -05:00
Mridul Muralidharan	7acab3ab45	Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo	2013-04-22 08:01:13 +05:30
koeninger	b2a3f24dde	first attempt at an RDD to pull data from JDBC sources	2013-04-21 00:29:37 -05:00
Andrew xia	8436bd5d4a	remove TaskSetQueueManager and update code style	2013-04-19 02:17:22 +08:00
Andrew xia	e0603d7e8b	refactor the Schedulable interface and add unit test for SchedulingAlgorithm	2013-04-18 13:13:54 +08:00
Mridul Muralidharan	5ee2f5c483	Cache pattern, add (commented out) alternatives for check* apis	2013-04-17 23:13:34 +05:30
Mridul Muralidharan	f07961060d	Add a small note on spark.tasks.schedule.aggression	2013-04-17 23:13:02 +05:30
Mridul Muralidharan	02dffd2eb0	Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained	2013-04-17 05:52:57 +05:30
Mridul Muralidharan	ad80f68eb5	remove spurious debug statements	2013-04-16 22:15:34 +05:30
Mridul Muralidharan	f7969f72ee	Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example)	2013-04-16 21:51:38 +05:30
Mridul Muralidharan	323ab8ff3b	Scala does not prevent variable shadowing ! Sick error due to it ...	2013-04-16 17:05:10 +05:30
shane-huang	b493f55a4f	fix a bug in netty Block Fetcher Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-16 10:01:01 +08:00
Mridul Muralidharan	59c380d69a	Fix npe	2013-04-16 03:29:38 +05:30
Mridul Muralidharan	dd2b64ec97	Fix bug with atomic update	2013-04-16 03:19:24 +05:30
Mridul Muralidharan	5540ab8243	Use hostname instead of hostport for executor, fix creation of workdir	2013-04-16 02:57:43 +05:30
Mridul Muralidharan	eb7e95e833	Commit job to persist files	2013-04-16 02:56:36 +05:30
Matei Zaharia	a64c107449	Make ShuffledRDD.prev transient	2013-04-15 16:41:51 -04:00
Mridul Muralidharan	19652a44be	Fix issue with FileSuite failing	2013-04-15 19:16:36 +05:30
Mridul Muralidharan	54b3d45b81	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:26:50 +05:30
Mridul Muralidharan	d90d2af103	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:12:11 +05:30
Matei Zaharia	c35d530bcf	Fix compile error	2013-04-13 12:43:12 -04:00
Andrew Ash	29d3440efb	Add details when BlockManager heartbeats time out Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs Before: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats After: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms	2013-04-11 01:54:02 -03:00
Andrew xia	2f883c515f	Contiue to update codes for scala code style 1.refactor braces for "class" "if" "while" "for" "match" 2.make code lines less than 100 3.refactor class parameter and extends defination	2013-04-09 13:02:50 +08:00
Matei Zaharia	054feb6448	Fixed a bug with zip	2013-04-07 21:15:21 -04:00
Matei Zaharia	b5900d47b1	Fix compile warning	2013-04-07 20:55:42 -04:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Mridul Muralidharan	6798a09df8	Add support for building against hadoop2-yarn : adding new maven profile for it	2013-04-07 17:47:38 +05:30
shane-huang	df47b40b76	Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages change reference from io.Source to scala.io.Source to avoid looking into io.netty package Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-07 14:37:12 +08:00
Andrew xia	2b373dd07a	add properties default value null to fix sbt/sbt test errors	2013-04-02 12:11:14 +08:00
Mark Hamstra	e215f67923	Correct sense of 'filter out' in comment.	2013-03-31 08:00:13 -07:00
Mark Hamstra	8bcdc64005	Fixed broken filter in getWritableClass[T]	2013-03-30 22:09:52 -07:00
Matei Zaharia	9831bc1a09	Merge pull request #539 from cgrothaus/fix-webui-workdirpath Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-29 22:16:22 -07:00
Matei Zaharia	3cc8ab6e29	Merge pull request #541 from stephenh/shufflecoalesce Add a shuffle parameter to coalesce.	2013-03-29 22:14:07 -07:00
Andrew xia	1a28f92711	change some typo and some spacing	2013-03-29 08:34:28 +08:00
Andrew xia	def3d1c84a	1.remove redundant spacing in source code 2.replace get/set functions with val and var defination	2013-03-29 08:20:35 +08:00
Holden Karau	f5df729b12	Explicitly catch all throwables (warning in 2.10)	2013-03-24 16:15:32 -07:00
Stephen Haberman	dd854d5b9f	Use Boolean in the Java API, and != for assert.	2013-03-23 11:49:45 -05:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Matei Zaharia	b8949cab88	Merge pull request #505 from stephenh/volatile Make Executor fields volatile since they're read from the thread pool.	2013-03-23 07:19:34 -07:00
Matei Zaharia	fd53f2fc7b	Merge pull request #510 from markhamstra/WithThing mapWith, flatMapWith and filterWith	2013-03-23 07:13:21 -07:00
Andrew xia	d1d9bdaabe	Just update typo and comments	2013-03-23 07:25:30 +08:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Christoph Grothaus	445f387ef4	Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-22 11:08:40 +01:00
Matei Zaharia	35588490cb	Merge pull request #538 from rxin/cogroup Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 19:27:47 -07:00
Stephen Haberman	4f4215311a	Merge branch 'master' into volatile	2013-03-20 15:37:10 -05:00
Matei Zaharia	b812e6b7bb	Merge pull request #526 from markhamstra/foldByKey Add foldByKey	2013-03-20 11:21:02 -07:00
Reynold Xin	d48ee7e55e	Merge branch 'master' of github.com:mesos/spark into cogroup	2013-03-20 14:00:28 +08:00
Reynold Xin	00a11304fd	Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 13:49:51 +08:00
Matei Zaharia	945d1e720e	Merge pull request #536 from sasurfer/master CoalescedRDD for many partitions	2013-03-19 21:59:06 -07:00
Matei Zaharia	1cbbe94ac1	Merge pull request #534 from stephenh/removetrycatch Remove try/catch block that can't be hit.	2013-03-19 21:34:34 -07:00
Andrey Kouznetsov	bd167f83b0	call setConf from input format if it is Configurable	2013-03-19 17:15:15 +04:00
Giovanni Delussu	aceae029f7	CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD. The limitation was in the range that Scala.Int can represent.	2013-03-19 11:25:45 +01:00
Stephen Haberman	fb34967815	Remove try/catch block that can't be hit.	2013-03-18 01:55:50 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	32979b5e7d	whitespace	2013-03-16 13:36:46 -07:00
Mark Hamstra	ca9f81e8fc	refactor foldByKey to use combineByKey	2013-03-16 13:31:01 -07:00
Mark Hamstra	1fb192ef40	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-16 12:17:13 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Matei Zaharia	c1e9cdc49f	Merge pull request #525 from stephenh/subtractByKey Add PairRDDFunctions.subtractByKey.	2013-03-16 11:47:45 -07:00
Mark Hamstra	ef75be3bf7	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-15 21:41:24 -07:00
Andrew xia	5892393140	refactor fair scheduler implementation 1.Chage "pool" properties to be the memeber of ActiveJob 2.Abstract the Schedulable of Pool and TaskSetManager 3.Abstract the FIFO and FS comparator algorithm 4.Miscellaneous changing of class define and construction	2013-03-16 11:13:38 +08:00
Matei Zaharia	cdbfd1e196	Merge pull request #516 from squito/fix_local_metrics Fix local metrics	2013-03-15 15:13:28 -07:00
Mark Hamstra	857010392b	Fuller implementation of foldByKey	2013-03-15 10:56:05 -07:00
Mark Hamstra	16a4ca4537	restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test	2013-03-14 13:58:37 -07:00
Mark Hamstra	b1422cbdd5	added foldByKey	2013-03-14 12:59:58 -07:00
Stephen Haberman	7786881f47	Fix tabs that snuck in.	2013-03-14 14:57:12 -05:00
Stephen Haberman	7d8bb4df3a	Allow subtractByKey's other argument to have a different value type.	2013-03-14 14:44:15 -05:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Matei Zaharia	4032beba49	Merge pull request #521 from stephenh/earlyclose Close the reader in HadoopRDD as soon as iteration end.	2013-03-13 19:29:46 -07:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	cd5b947cf6	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-13 13:16:14 -07:00
Stephen Haberman	1a175d13b9	Add NextIterator.closeIfNeeded.	2013-03-13 10:17:39 -05:00
Stephen Haberman	8f00d23598	Remove NextIterator.close default implementation.	2013-03-12 12:30:10 -05:00
Harold Lim	0b64e5f1ac	Removed some commented code	2013-03-12 13:31:27 +08:00
Harold Lim	f5b1fecb9f	Cleaned up the code	2013-03-12 13:31:27 +08:00
Harold Lim	b5325182a3	Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies	2013-03-12 13:31:27 +08:00
Harold Lim	54ed7c4af4	Changed the name of the system property to set the allocation xml	2013-03-12 13:31:27 +08:00
Harold Lim	c07087364b	Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler	2013-03-12 13:31:27 +08:00
Stephen Haberman	9e68f48625	More quickly call close in HadoopRDD. This also refactors out the common "gotNext" iterator pattern into a shared utility class.	2013-03-11 23:59:17 -05:00
Charles Reiss	769d399674	Send block sizes as longs.	2013-03-11 14:17:05 -07:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	b57df1f5e3	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-10 16:56:31 -07:00
Matei Zaharia	91a9d093bd	Merge pull request #512 from patelh/fix-kryo-serializer Fix reference bug in Kryo serializer, add test, update version	2013-03-10 15:48:23 -07:00
Matei Zaharia	557cfd0f4d	Merge pull request #515 from woggling/deploy-app-death Notify standalone deploy client of application death.	2013-03-10 15:44:57 -07:00
Matei Zaharia	a59cc6060f	Merge remote-tracking branch 'stephenh/nomocks' Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 13:39:10 -07:00
Imran Rashid	20f01a0a1b	enable task metrics in local mode, add tests	2013-03-09 21:17:31 -08:00
Imran Rashid	ec30188a2a	rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches	2013-03-09 21:16:53 -08:00
Charles Reiss	b0983c5762	Notify standalone deploy client of application death. Usually, this isn't necessary since the application will be removed as a result of the deploy client disconnecting, but occassionally, the standalone deploy master removes an application otherwise. Also mark applications as FAILED instead of FINISHED when they are killed as a result of their executors failing too many times.	2013-03-09 11:29:45 -08:00
Hiral Patel	664e5fd24b	Fix reference bug in Kryo serializer, add test, update version	2013-03-07 22:16:11 -08:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Matei Zaharia	9f0dc829cb	Fix TaskMetrics not being serializable	2013-03-04 12:08:31 -08:00
Matei Zaharia	04fb81ffe5	Merge pull request #506 from rxin/spark-706 Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-03-03 17:20:07 -08:00
Imran Rashid	0bd1d00c2a	minor cleanup based on feedback in review request	2013-03-03 16:46:45 -08:00
Imran Rashid	f1006b99ff	change CleanupIterator to CompletionIterator	2013-03-03 16:39:05 -08:00
Imran Rashid	8fef5b9c5f	refactoring of TaskMetrics	2013-03-03 16:34:04 -08:00
Imran Rashid	d36abdb053	Merge branch 'master' into stageInfo	2013-03-03 15:20:46 -08:00
Reynold Xin	44134e12bb	Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-02-28 15:14:59 -08:00
Stephen Haberman	6415c2bb60	Don't create the Executor until we have everything it needs.	2013-02-28 12:38:09 -06:00
Stephen Haberman	80eecd2cb1	Make Executor fields volatile since they're read from the thread pool.	2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury	4ab387bcdb	Fixed master datastructure updates after removing an application; and a typo.	2013-02-27 13:52:44 -08:00
Matei Zaharia	ece3edfffa	Fix a problem with no hosts being counted as alive in the first job	2013-02-26 12:11:03 -08:00
Matei Zaharia	73697e2891	Fix overly large thread names in PySpark	2013-02-26 12:07:59 -08:00
Stephen Haberman	a65aa549ff	Override DAGScheduler.runLocally so we can remove the Thread.sleep.	2013-02-25 23:49:32 -06:00
Stephen Haberman	a4adeb255c	Merge branch 'master' into nomocks Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-02-25 23:48:52 -06:00
Tathagata Das	c02e064938	Fixed replication bug in BlockManager	2013-02-25 17:27:46 -08:00
Matei Zaharia	490f056cdd	Allow passing sparkHome and JARs to StreamingContext constructor Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.	2013-02-25 15:13:30 -08:00
Matei Zaharia	568bdaf8ae	Set spark.deploy.spreadOut to true by default in 0.7 (improves locality)	2013-02-25 14:34:55 -08:00
Matei Zaharia	1ef58dadcc	Add a config property for Akka lifecycle event logging	2013-02-25 14:01:24 -08:00
Matei Zaharia	ceaec4a675	Merge pull request #498 from pwendell/shutup-akka Disable remote lifecycle logging from Akka.	2013-02-25 12:31:24 -08:00
Patrick Wendell	85a85646d9	Disable remote lifecycle logging from Akka. This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.	2013-02-25 12:25:43 -08:00
Imran Rashid	8f17387d97	remove bogus comment	2013-02-25 10:31:06 -08:00
Matei Zaharia	6ae9a22c3e	Get spark.default.paralellism on each call to defaultPartitioner, instead of only once, in case the user changes it across Spark uses	2013-02-25 10:28:08 -08:00
Matei Zaharia	d6e6abece3	Merge pull request #459 from stephenh/bettersplits Change defaultPartitioner to use upstream split size.	2013-02-25 09:22:04 -08:00
Stephen Haberman	c44ccf2862	Use default parallelism if its set.	2013-02-24 23:54:03 -06:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Christoph Grothaus	f39f2b7636	Incorporate feedback from mateiz: - we do not need getEnvOrEmpty - Instead of saving SPARK_NONDAEMON_JAVA_OPTS, it would be better to modify the scripts to use a different variable name for the JAVA_OPTS they do eventually use	2013-02-24 21:24:30 +01:00
Tathagata Das	dff53d1b94	Merge branch 'mesos-master' into streaming	2013-02-24 12:17:22 -08:00
Matei Zaharia	3b9f929467	Merge pull request #468 from haitaoyao/master support customized java options for Master, Worker, Executor, and Repl	2013-02-23 23:38:15 -08:00
Stephen Haberman	37c7a71f9c	Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD.	2013-02-24 00:27:53 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Patrick Wendell	931f439be9	Responding to code review	2013-02-23 15:40:41 -08:00
Patrick Wendell	f51b0f93f2	Adding Java-accessible methods to Vector.scala This is needed for the Strata machine learning tutorial (and also is generally helpful).	2013-02-23 13:26:59 -08:00
Matei Zaharia	d942d39072	Handle exceptions in RecordReader.close() better (suggested by Jim Donahue)	2013-02-23 11:19:07 -08:00
Matei Zaharia	c89824046a	Merge pull request #490 from woggling/conn-death Detect when SendingConnections disconnect even if we aren't sending to them	2013-02-22 22:58:19 -08:00
Charles Reiss	c8a7886921	Detect when SendingConnections drop by trying to read them. Comment fix	2013-02-22 16:11:52 -08:00
Matei Zaharia	d4d7993bf5	Several fixes to the work to log when no resources can be used by a job. Fixed some of the messages as well as code style.	2013-02-22 15:51:37 -08:00
Matei Zaharia	f33662c133	Merge remote-tracking branch 'pwendell/starvation-check' Also fixed a bug where master was offering executors on dead workers Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-02-22 15:27:41 -08:00
Matei Zaharia	7341de0d48	Merge pull request #475 from JoshRosen/spark-668 Remove hack workaround for SPARK-668	2013-02-22 14:56:18 -08:00
Patrick Wendell	f8c3a03d55	SPARK-702: Replace Function --> JFunction in JavaAPI Suite. In a few places the Scala (rather than Java) function class is used.	2013-02-22 12:54:15 -08:00
Imran Rashid	0f37b43b40	make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD	2013-02-21 16:56:28 -08:00
Imran Rashid	9230617f23	add cleanup iterator	2013-02-21 16:55:14 -08:00
Imran Rashid	81bd07da26	sparkListeners should be a val	2013-02-21 15:21:45 -08:00
Imran Rashid	796e934d31	add some docs & some cleanup	2013-02-21 15:19:34 -08:00
Imran Rashid	394d3acc3e	store taskInfo & metrics together in a tuple	2013-02-21 15:19:34 -08:00
Imran Rashid	7960927cf4	get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo	2013-02-21 15:19:34 -08:00
Imran Rashid	d0bfac3eed	taskInfo tracks if a task is run on a preferred host	2013-02-21 15:19:34 -08:00
Imran Rashid	6f62a57858	add runtime breakdowns	2013-02-21 15:19:34 -08:00
Imran Rashid	176cb20703	add task result size; better formatting for time interval distributions; cleanup distribution formatting	2013-02-21 15:19:33 -08:00
Imran Rashid	f2fcabf2ea	add timing around parts of executor & track result size	2013-02-21 15:19:33 -08:00
Imran Rashid	ff127cfcd3	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-21 15:16:21 -08:00
Imran Rashid	baab23abdf	TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task	2013-02-21 14:13:01 -08:00
haitao.yao	8215b95547	Merge branch 'mesos'	2013-02-21 10:07:24 +08:00
Christoph Grothaus	85a35c6840	Fix SPARK-698. From ExecutorRunner, launch java directly instead via the run scripts.	2013-02-20 21:42:11 +01:00
Tathagata Das	334ab92441	Fixed bug in CheckpointSuite	2013-02-20 10:26:36 -08:00
Tathagata Das	1cb725e417	Merge branch 'mesos-master' into streaming	2013-02-20 09:55:35 -08:00
Tathagata Das	fb9956256d	Merge branch 'mesos-master' into streaming Conflicts: core/src/main/scala/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala	2013-02-20 09:01:29 -08:00
Matei Zaharia	05bc02e80b	Merge pull request #482 from woggling/shutdown-exceptions Don't call System.exit over uncaught exceptions from shutdown hooks	2013-02-19 20:56:15 -08:00
haitao.yao	6a3d44c673	Merge branch 'mesos'	2013-02-20 10:23:58 +08:00
Charles Reiss	092c631fa8	Pull detection of being in a shutdown hook into utility function.	2013-02-19 17:49:55 -08:00
Reynold Xin	130f704baf	Added a method to create PartitionPruningRDD.	2013-02-19 16:03:52 -08:00
Charles Reiss	d0588bd6d7	Catch/log errors deleting temp dirs	2013-02-19 13:04:06 -08:00
Charles Reiss	687581c3ec	Paranoid uncaught exception handling for exceptions during shutdown	2013-02-19 13:03:02 -08:00
haitao.yao	7c129388fb	Merge branch 'mesos'	2013-02-19 11:22:24 +08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Imran Rashid	893bad9089	use appid instead of frameworkid; simplify stupid condition	2013-02-13 20:30:21 -08:00
Imran Rashid	8f18e7e863	include jobid in Executor commandline args	2013-02-13 13:05:13 -08:00
Matei Zaharia	bfeed4725d	Merge pull request #465 from pwendell/java-sort-fix SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 18:23:12 -08:00
Patrick Wendell	21df6ffc13	SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 17:43:26 -08:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Josh Rosen	e9fb25426e	Remove hack workaround for SPARK-668. Renaming the type paramters solves this problem (see SPARK-694). I tried this fix earlier, but it didn't work because I didn't run `sbt/sbt clean` first.	2013-02-11 11:19:20 -08:00
Imran Rashid	e9f53ec0ea	undo chnage to onCompleteCallbacks	2013-02-11 09:36:49 -08:00
Matei Zaharia	da8afbc77e	Some bug and formatting fixes to FT Conflicts: core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:43:38 -08:00
root	1b47fa2752	Detect hard crashes of workers using a heartbeat mechanism. Also fixes some issues in the rest of the code with detecting workers this way. Conflicts: core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:28:28 -08:00
Matei Zaharia	8c66c49962	Tweak web UI so that people don't get confused about master URL format Conflicts: core/src/main/twirl/spark/deploy/master/index.scala.html core/src/main/twirl/spark/deploy/worker/index.scala.html	2013-02-10 21:58:34 -08:00
Imran Rashid	d9461b15d3	cleanup a bunch of imports	2013-02-10 21:41:40 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Imran Rashid	383af599bb	SparkContext.addSparkListener; "std" listener in StatsReportListener	2013-02-10 14:19:37 -08:00
Imran Rashid	b7d9e24394	use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver	2013-02-10 14:18:52 -08:00
Stephen Haberman	680f42e6cd	Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
Patrick Wendell	2ed791fd7f	Minor fixes	2013-02-09 22:00:38 -08:00
Patrick Wendell	1859c9f93c	Changing to use Timer based on code review	2013-02-09 21:55:17 -08:00
Matei Zaharia	ccb1ca4a23	Merge pull request #448 from squito/fetch_maxBytesInFlight add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-09 18:15:18 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	4619ee0787	Move JavaRDDLike.coalesce into the right places.	2013-02-09 20:05:42 -06:00
Stephen Haberman	921be76533	Use stubs instead of mocks for DAGSchedulerSuite.	2013-02-09 16:42:18 -06:00
Stephen Haberman	fb7599870f	Fix JavaRDDLike.coalesce return type.	2013-02-09 16:10:52 -06:00
Stephen Haberman	2a18cd826c	Add back return types.	2013-02-09 10:12:04 -06:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Imran Rashid	04e828f7c1	general fixes to Distribution, plus some tests	2013-02-08 19:07:36 -08:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f4d43cb43e	Remove unneeded zipWithIndex. Also rename r->rdd and remove unneeded extra type info.	2013-02-05 21:26:45 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Stephen Haberman	67df7f2fa2	Add private, minor formatting.	2013-02-05 21:08:21 -06:00
Imran Rashid	379564c7e0	setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place	2013-02-05 18:30:21 -08:00
Matei Zaharia	9cfa068379	Merge pull request #450 from stephenh/inlinemergepair Inline mergePair to look more like the narrow dep branch.	2013-02-05 18:28:44 -08:00
Stephen Haberman	870b2aaf5d	Merge branch 'master' into fixdeathpactexception Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala	2013-02-05 20:27:09 -06:00
Stephen Haberman	0e19093fd8	Handle Terminated to avoid endless DeathPactExceptions. Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.	2013-02-05 18:58:00 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Imran Rashid	1704b124d8	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:33:52 -08:00
Imran Rashid	cfab1a3528	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:31:46 -08:00
Imran Rashid	696e4b2167	track remoteFetchTime	2013-02-05 14:29:16 -08:00
Imran Rashid	b29f9cc978	BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance	2013-02-05 14:00:44 -08:00
Imran Rashid	e319ac74c1	cogrouped RDD stores the amount of time taken to read shuffle data in each task	2013-02-05 10:18:16 -08:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Imran Rashid	9df7e2ae55	Shuffle Fetchers use a timed iterator	2013-02-05 10:18:16 -08:00

... 7 8 9 10 11 ...

1818 commits