ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Shivaram Venkataraman	5e9a9317c5	Merge branch 'master' of git://github.com/mesos/spark into netty-dbg	2013-06-12 16:38:01 -07:00
ryanlecompte	db5bca08ff	add a new top K method to RDD using a bounded priority queue	2013-06-12 10:54:16 -07:00
Patrick Wendell	fd6148c8b2	Removing print statement	2013-06-10 10:27:25 -07:00
Andrew xia	190ec61799	change code style and debug info	2013-06-10 15:27:02 +08:00
Patrick Wendell	ef14dc2e77	Adding Java-API version of compression codec	2013-06-09 18:09:46 -07:00
Patrick Wendell	df592192e7	Monads FTW	2013-06-09 18:09:24 -07:00
Patrick Wendell	083a3485ab	Clean extra whitespace	2013-06-09 11:49:33 -07:00
Patrick Wendell	d1bbcebae5	Adding compression to Hadoop save functions	2013-06-09 11:39:35 -07:00
Mingfei	ade822011d	not check return value of eventQueue.take	2013-06-08 16:26:45 +08:00
Mingfei	4fd86e0e10	delete test code for joblogger in SparkContext	2013-06-08 15:45:47 +08:00
Mingfei	362f0f93ac	Merge branch 'master' of https://github.com/mesos/spark	2013-06-08 15:20:13 +08:00
Mingfei	1a4d93c025	modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work	2013-06-08 14:23:39 +08:00
Matei Zaharia	b58a29295b	Small formatting and style fixes	2013-06-07 22:51:28 -07:00
Matei Zaharia	c8fc423bc2	Merge pull request #631 from jerryshao/master Fix block manager UI display issue when enable spark.cleaner.ttl	2013-06-07 22:43:18 -07:00
Matei Zaharia	c9ca0a4a58	Small code style fix to SchedulingAlgorithm.scala	2013-06-07 22:40:44 -07:00
Matei Zaharia	1ae60bcb36	Merge pull request #634 from xiajunluan/master [Spark-753] Fix ClusterSchedulSuite unit test failed	2013-06-07 22:39:06 -07:00
Shivaram Venkataraman	ac480fd977	Clean up variables and counters in BlockFetcherIterator	2013-06-06 16:34:27 -07:00
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Shivaram Venkataraman	cb2f5046ee	Pass in bufferSize to BufferedOutputStream	2013-06-05 15:09:02 -07:00
Shivaram Venkataraman	c851957fe4	Don't write zero block files with java serializer	2013-06-05 14:28:38 -07:00
Christopher Nguyen	9d35904357	In the current code, when both partitions happen to have zero-length, the return mean will be NaN. Consequently, the result of mean after reducing over all partitions will also be NaN, which is not correct if there are partitions with non-zero length. This patch fixes this issue.	2013-06-04 22:12:47 -07:00
Matei Zaharia	fff3728552	Merge pull request #640 from pwendell/timeout-update Fixing bug in BlockManager timeout	2013-06-04 16:09:50 -07:00
Patrick Wendell	061fd3ae36	Fixing bug in BlockManager timeout	2013-06-04 19:02:44 -04:00
Matei Zaharia	f420d4f228	Merge pull request #639 from pwendell/timeout-update Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 15:25:58 -07:00
Patrick Wendell	8bd4e12104	Bump akka and blockmanager timeouts to 60 seconds	2013-06-04 18:14:24 -04:00
Shivaram Venkataraman	96943a1cc0	var to val	2013-06-03 12:29:38 -07:00
Shivaram Venkataraman	cd347f547a	Reuse the file object as it is valid after delete	2013-06-03 12:27:51 -07:00
Shivaram Venkataraman	a058b0acf3	Delete a file for a block if it already exists.	2013-06-03 12:10:00 -07:00
Andrew xia	606bb1b450	Fix schedulingAlgorithm bugs for unit test	2013-06-03 10:29:23 +08:00
Gavin Li	4a9913d66a	add ut for pipe enhancement	2013-06-02 23:21:09 +00:00
Shivaram Venkataraman	038cfc1a9a	Make connect timeout configurable	2013-05-31 23:32:18 -07:00
Shivaram Venkataraman	91aca92249	Another round of Netty fixes. 1. Avoid race condition between stop and copier completion 2. Handle socket exceptions by reporting them and filling in a failed FetchResult	2013-05-31 23:21:38 -07:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Reynold Xin	de1167bf2c	Incorporated Charles' feedback to put rdd metadata removal in BlockManagerMasterActor.	2013-05-31 15:54:57 -07:00
Reynold Xin	ba5e544461	More block manager cleanup. Implemented a removeRdd method in BlockManager, and use that to implement RDD.unpersist. Previously, unpersist needs to send B akka messages, where B = number of blocks. Now unpersist only needs to send W akka messages, where W = the number of workers.	2013-05-31 01:48:16 -07:00
jerryshao	926f41cc52	fix block manager UI display issue when enable spark.cleaner.ttl	2013-05-31 09:32:52 +08:00
Reynold Xin	f6ad3781b1	Fixed the flaky unpersist test in RDDSuite.	2013-05-30 16:28:08 -07:00
Reynold Xin	bed1b08169	Do not create symlink for local add file. Instead, copy the file. This prevents Spark from changing the original file's permission, and also allow add file to work on non-posix operating systems.	2013-05-30 16:21:49 -07:00
Shivaram Venkataraman	3b0cd17343	Merge branch 'master' of git://github.com/mesos/spark Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2013-05-30 14:36:24 -07:00
Andrew xia	c3db3ea554	1. Add unit test for local scheduler 2. Move localTaskSetManager to a new file	2013-05-30 20:49:40 +08:00
Andrew xia	ecceb101d3	implement FIFO and fair scheduler for spark local mode	2013-05-30 10:43:01 +08:00
Shivaram Venkataraman	19fd6d54c0	Also flush serializer in revertPartialWrites	2013-05-29 17:29:34 -07:00
Shivaram Venkataraman	618c8cae1e	Skip fetching zero-sized blocks in OIO. Also unify splitLocalRemoteBlocks for netty/nio and add a test case	2013-05-29 13:18:54 -07:00
Matei Zaharia	6ed71390d9	Merge pull request #626 from stephenh/remove-add-if-no-port Remove unused addIfNoPort.	2013-05-29 10:14:22 -07:00
Shivaram Venkataraman	b79b10a6d6	Flush serializer to fix zero-size kryo blocks bug. Also convert the local-cluster test case to check for non-zero block sizes	2013-05-29 00:52:55 -07:00
Matei Zaharia	41d230ccb0	Merge pull request #611 from squito/classloader Use default classloaders for akka & deserializing task results	2013-05-28 23:35:24 -07:00
Shivaram Venkataraman	fbc1ab3468	Couple of Netty fixes a. Fix the port number by reading it from the bound channel b. Fix the shutdown sequence to make sure we actually block on the channel c. Fix the unit test to use two JVMs.	2013-05-28 16:27:16 -07:00
Stephen Haberman	4fe1fbdd51	Remove unused addIfNoPort.	2013-05-28 16:26:32 -05:00
Matei Zaharia	3db1e17baa	Merge pull request #620 from jerryshao/master Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations	2013-05-27 21:31:43 -07:00
Matei Zaharia	e8d4b6c296	Merge pull request #529 from xiajunluan/master [SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler	2013-05-25 21:09:03 -07:00
Reynold Xin	6bbbe01287	Fixed a stupid mistake that NonJavaSerializableClass was made Java serializable.	2013-05-24 16:51:45 -07:00
Reynold Xin	26962c9340	Automatically configure Netty port. This makes unit tests using local-cluster pass. Previously they were failing because Netty was trying to bind to the same port for all processes. Pair programmed with @shivaram.	2013-05-24 16:39:33 -07:00
Reynold Xin	6ea085169d	Fixed the bug that shuffle serializer is ignored by the new shuffle block iterators for local blocks. Also added a unit test for that.	2013-05-24 14:08:37 -07:00
jerryshao	bd3ea8f2a6	fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException	2013-05-24 14:26:19 +08:00
Matei Zaharia	a2b0a7975c	Merge pull request #619 from woggling/adjust-sampling Use ARRAY_SAMPLE_SIZE constant instead of hard-coded 100.0 in SizeEstimator	2013-05-21 18:16:20 -07:00
Charles Reiss	f350f14084	Use ARRAY_SAMPLE_SIZE constant instead of 100.0	2013-05-21 18:11:33 -07:00
Charles Reiss	786c97b87c	DistributedSuite: remove dead test code	2013-05-21 11:35:49 -07:00
Andrew xia	ecd6d75c6a	fix bug of unit tests	2013-05-21 06:49:23 +08:00
Reynold Xin	5912cc4967	Merge pull request #610 from JoshRosen/spark-747 Throw exception if TaskResult exceeds Akka frame size	2013-05-17 19:58:40 -07:00
Reynold Xin	8d78c5f89f	Changed the logging level from info to warning when addJar(null) is called.	2013-05-17 18:51:35 -07:00
Andrew xia	3d4672eaa9	Merge branch 'master' into xiajunluan Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala	2013-05-18 07:28:03 +08:00
Andrew xia	d19753b9c7	expose TaskSetManager type to resourceOffer function in ClusterScheduler	2013-05-18 06:45:19 +08:00
Andrew xia	c6e2770bfe	Fix ClusterScheduler bug to avoid allocating tasks to same slave	2013-05-17 05:10:38 +08:00
Mridul Muralidharan	f0881f8d48	Hope this does not turn into a bike shed change	2013-05-17 01:58:50 +05:30
Mridul Muralidharan	feddd2530d	Filter out nulls - prevent NPE	2013-05-16 17:49:14 +05:30
Josh Rosen	b8e46b6074	Abort job if result exceeds Akka frame size; add test.	2013-05-16 01:57:57 -07:00
Matei Zaharia	2f576aba8f	Merge pull request #602 from rxin/shufflemerge Manual merge & cleanup of Shane's Shuffle Performance Optimization	2013-05-15 18:06:24 -07:00
Reynold Xin	203d7b7c14	Merge pull request #593 from squito/driver_ui_link Master UI has link to Application UI	2013-05-15 00:47:20 -07:00
Reynold Xin	f3491cb89b	Merge branch 'master' of github.com:mesos/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/test/scala/spark/DistributedSuite.scala project/SparkBuild.scala	2013-05-15 00:31:52 -07:00
Reynold Xin	f9d40a5848	Added a comment in JdbcRDD for example usage.	2013-05-14 23:29:57 -07:00
Reynold Xin	81ad2fa331	Merge branch 'jdbc' of github.com:koeninger/spark Conflicts: project/SparkBuild.scala	2013-05-14 23:12:00 -07:00
Imran Rashid	38d4b97c6d	use threads classloader when deserializing task results; classnotfoundexception includes classloader	2013-05-14 22:32:14 -07:00
Imran Rashid	d7d1da79d3	when akka starts, use akkas default classloader (current thread)	2013-05-14 22:32:09 -07:00
Cody Koeninger	b16c4896f6	add test for JdbcRDD using embedded derby, per rxin suggestion	2013-05-14 23:44:04 -05:00
Matei Zaharia	016ac86830	Merge pull request #601 from rxin/emptyrdd-master EmptyRDD (master branch 0.8)	2013-05-13 21:45:36 -07:00
Matei Zaharia	4b354e0a08	Merge pull request #589 from mridulm/master Add support for instance local scheduling	2013-05-13 17:39:19 -07:00
Patrick Wendell	7f0833647b	Capturing class name	2013-05-12 07:54:03 -07:00
Patrick Wendell	72b9c4cb6e	Small fix	2013-05-11 23:53:50 -07:00
Patrick Wendell	1c15b85051	Removing import	2013-05-11 23:52:53 -07:00
Patrick Wendell	059ab88754	Changing technique to use same code path in all cases	2013-05-11 23:50:54 -07:00
Cody Koeninger	3da2305ed0	code cleanup per rxin comments	2013-05-11 23:59:07 -05:00
Josh Rosen	440719109e	Throw exception if task result exceeds Akka frame size. This partially addresses SPARK-747.	2013-05-11 19:17:13 -07:00
Patrick Wendell	a5c28bb888	Removing unnecessary map	2013-05-11 14:20:39 -07:00
Patrick Wendell	0345954530	SPARK-738: Spark should detect and squash nonserializable exceptions	2013-05-11 14:17:09 -07:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Mridul Muralidharan	b05c9d22d7	Remove explicit hardcoding of yarn-standalone as args(0) if it is missing.	2013-05-09 18:49:12 +05:30
Imran Rashid	0ab818d508	fix linebreak	2013-05-09 00:38:59 -07:00
Reynold Xin	9cafacf32d	Added test for Netty suite.	2013-05-07 22:42:37 -07:00
Reynold Xin	5d70ee4663	Cleaned up connection manager (moved many classes to their own files).	2013-05-07 22:42:15 -07:00
Reynold Xin	8388e8dd7a	Minor style fix in DiskStore...	2013-05-07 18:40:35 -07:00
Reynold Xin	547dcbe494	Cleaned up Scala files in network/netty from Shane's PR.	2013-05-07 18:39:33 -07:00
Reynold Xin	9e64396ca4	Cleaned up the Java files from Shane's PR.	2013-05-07 18:30:54 -07:00
Reynold Xin	0e5cc30868	Cleaned up BlockManager and BlockFetcherIterator from Shane's PR.	2013-05-07 18:18:24 -07:00
Reynold Xin	8b79485171	Moved BlockFetcherIterator to its own file.	2013-05-07 17:02:32 -07:00
Reynold Xin	90577ada69	Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge Conflicts: core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/DiskStore.scala project/SparkBuild.scala	2013-05-07 15:56:19 -07:00
Reynold Xin	64d4d2b036	Added tests for joins, cogroups, and unions for EmptyRDD.	2013-05-06 16:30:46 -07:00
Reynold Xin	0fd84965f6	Added EmptyRDD.	2013-05-06 15:40:34 -07:00
Imran Rashid	22a5063ae4	switch from separating appUI host & port to combining into just appUiUrl	2013-05-05 12:19:11 -07:00
Matei Zaharia	7af92f248b	Merge pull request #597 from JoshRosen/webui-fixes Two minor bug fixes for Spark Web UI	2013-05-04 22:29:17 -07:00
Reynold Xin	0a2bed356b	Fixed flaky unpersist test in DistributedSuite.	2013-05-04 21:50:08 -07:00
Reynold Xin	62a077cd08	Merge branch 'unpersist-test' of github.com:shivaram/spark into blockmanager	2013-05-04 21:49:50 -07:00
Josh Rosen	42b1953c53	Fix SPARK-630: app details page shows finished executors as running.	2013-05-04 18:34:47 -07:00
Josh Rosen	c0688451a6	Fix wrong closing tags in web UI HTML.	2013-05-04 18:34:46 -07:00
Josh Rosen	d48e9fde01	Fix SPARK-629: weird number of cores in job details page.	2013-05-04 18:34:45 -07:00
Mridul Muralidharan	25198d7e9e	Merge branch 'master' of github.com:mridulm/spark	2013-05-04 20:45:56 +05:30
Mridul Muralidharan	5b011d18d7	Merge from master	2013-05-04 20:41:27 +05:30
Mridul Muralidharan	edb57c8331	Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach	2013-05-04 19:47:45 +05:30
Matei Zaharia	3bf2c868c3	Merge pull request #594 from shivaram/master Add zip partitions to Java API	2013-05-03 18:27:30 -07:00
Shivaram Venkataraman	2274ad0786	Fix flaky test by changing catch and adding sleep	2013-05-03 16:35:35 -07:00
Shivaram Venkataraman	bb8a434f9d	Add zipPartitions to Java API.	2013-05-03 15:14:02 -07:00
Imran Rashid	6fae936088	applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui	2013-05-03 12:59:10 -07:00
Mridul Muralidharan	ea2a6f91d3	pull from master	2013-05-04 00:35:59 +05:30
Reynold Xin	93091f6936	Merge branch 'master' of github.com:mesos/spark into blockmanager	2013-05-03 01:02:32 -07:00
Reynold Xin	2bc895a829	Updated according to Matei's code review comment.	2013-05-03 01:02:16 -07:00
Mridul Muralidharan	11589c39d9	Fix ZippedRDD as part Matei's suggestion	2013-05-03 12:23:30 +05:30
Matei Zaharia	6fe9d4e61e	Merge pull request #592 from woggling/localdir-fix Don't accept generated local directory names that can't be created	2013-05-02 21:33:56 -07:00
Matei Zaharia	538ee755b4	Merge pull request #581 from jerryshao/master fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming	2013-05-02 09:01:42 -07:00
Charles Reiss	c847dd3da2	Don't accept generated temp directory names that can't be created successfully.	2013-05-01 23:19:10 -07:00
Reynold Xin	4a31877408	Added the unpersist api to JavaRDD.	2013-05-01 20:31:54 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Mridul Muralidharan	dfde9ce9dd	comment out debug versions of checkHost, etc from Utils - which were used to test	2013-05-02 07:41:33 +05:30
Mridul Muralidharan	1b5aaeadc7	Integrate review comments 2	2013-05-02 07:30:06 +05:30
jerryshao	c047f0e3ad	filter out Spark streaming block RDD and sort RDDInfo with id	2013-05-02 09:48:32 +08:00
Mridul Muralidharan	609a817f52	Integrate review comments on pull request	2013-05-02 06:44:33 +05:30
Reynold Xin	204eb32e14	Changed the type of the persistentRdds hashmap back to TimeStampedHashMap.	2013-05-01 16:14:58 -07:00
Reynold Xin	34637b97ec	Added SparkContext.cleanup back. Not sure why it was removed before ...	2013-05-01 16:12:37 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
harshars	8481562731	Merged Ram's commit on removing RDDs. Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-05-01 14:42:17 -07:00
Mridul Muralidharan	27764a00f4	Fix some npe introduced accidentally	2013-05-01 20:56:05 +05:30
Mridul Muralidharan	d960e7e0f8	a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling. b) Add some fixes to test code to ensure it passes (and fixes some other issues). c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.	2013-05-01 20:24:00 +05:30
Matei Zaharia	aa8fe1a209	Merge pull request #586 from mridulm/master Pull request to address issues Reynold Xin reported	2013-04-30 22:30:18 -07:00
Reynold Xin	dd7bef3147	Two minor fixes according to Ryan LeCompte's review.	2013-04-30 15:02:32 -07:00
Reynold Xin	cea6174573	Merge branch 'master' of github.com:mesos/spark into blockmanager Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala	2013-04-30 13:28:35 -07:00
Mridul Muralidharan	60cabb35cb	Add addition catch block for exception too	2013-05-01 01:17:14 +05:30
Mridul Muralidharan	3b748ced22	Be more aggressive and defensive in all uses of SelectionKey in select loop	2013-05-01 00:30:30 +05:30
Mridul Muralidharan	0f45477be1	Change indentation	2013-05-01 00:10:02 +05:30
Mridul Muralidharan	538614acfe	Be more aggressive and defensive in select also	2013-05-01 00:05:32 +05:30
Mridul Muralidharan	48854e1dbf	If key is not valid, close connection	2013-04-30 23:59:33 +05:30
Matei Zaharia	f708dda81e	Merge pull request #585 from pwendell/listener-perf [Fix SPARK-742] Task Metrics should not employ per-record timing by default	2013-04-30 07:51:40 -07:00
Mridul Muralidharan	e46d547ccd	Fix issues reported by Reynold	2013-04-30 16:15:56 +05:30
Reynold Xin	1055785a83	Allow specifying the shuffle write file buffer size. The default buffer size is 8KB in FastBufferedOutputStream, which is too small and would cause a lot of disk seeks.	2013-04-29 23:33:56 -07:00
Reynold Xin	7007201201	Added a shuffle block manager so it is easier in the future to consolidate shuffle output files.	2013-04-29 23:07:03 -07:00
Reynold Xin	d3586ef438	Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager Conflicts: core/src/main/scala/spark/storage/DiskStore.scala	2013-04-29 15:44:18 -07:00
Patrick Wendell	016ce1fa9c	Using full package name for util	2013-04-29 12:02:27 -07:00
Patrick Wendell	540be6b154	Modified version of the fix which just removes all per-record tracking.	2013-04-29 11:32:07 -07:00
Patrick Wendell	224fbac061	Spark-742: TaskMetrics should not employ per-record timing. This patch does three things: 1. Makes TimedIterator a trait with two implementations (one a no-op) 2. Makes the default behavior to use the no-op implementation 3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like the triat doesn't really reduce complexity in any way. In the future we can add other implementations, e.g. ones which perform sampling.	2013-04-29 11:13:43 -07:00
Matei Zaharia	0f45347c7b	More unit test fixes	2013-04-28 22:29:27 -07:00
Matei Zaharia	bce4089f22	Fix BlockManagerSuite to deal with clearing spark.hostPort	2013-04-28 22:23:48 -07:00
Matei Zaharia	68c07ea198	Merge pull request #582 from shivaram/master Add zip partitions interface	2013-04-28 20:19:33 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	6e84635ab9	Rename classes from MapZipped* to Zipped*	2013-04-28 15:58:40 -07:00
Mridul Muralidharan	afee902443	Attempt to fix streaming test failures after yarn branch merge	2013-04-28 22:26:45 +05:30
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6e6b5204ea	Create an empty directory when checkpointing a 0-partition RDD (fixes a test failure on Hadoop 2.0)	2013-04-25 00:42:37 -07:00
Reynold Xin	ba6ffa6a5f	Allow the specification of a shuffle serializer in the read path (for local block reads).	2013-04-24 17:38:07 -07:00
Reynold Xin	aa618ed2a2	Allow changing the serializer on a per shuffle basis.	2013-04-24 14:52:49 -07:00
Mridul Muralidharan	dd515ca3ee	Attempt at fixing merge conflict	2013-04-24 09:24:17 +05:30
Reynold Xin	31ce6c66d6	Added a BlockObjectWriter interface in block manager so ShuffleMapTask doesn't need to build up an array buffer for each shuffle bucket.	2013-04-23 17:48:59 -07:00
Mridul Muralidharan	8faf5c51c3	Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch	2013-04-24 02:31:57 +05:30
koeninger	dfac0aa5c2	prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement.	2013-04-22 21:12:52 -05:00
Mridul Muralidharan	7acab3ab45	Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo	2013-04-22 08:01:13 +05:30
koeninger	b2a3f24dde	first attempt at an RDD to pull data from JDBC sources	2013-04-21 00:29:37 -05:00
Mridul Muralidharan	ac2e8e8720	Add some basic documentation	2013-04-19 00:13:19 +05:30
Andrew xia	8436bd5d4a	remove TaskSetQueueManager and update code style	2013-04-19 02:17:22 +08:00
Andrew xia	e0603d7e8b	refactor the Schedulable interface and add unit test for SchedulingAlgorithm	2013-04-18 13:13:54 +08:00
Mridul Muralidharan	5ee2f5c483	Cache pattern, add (commented out) alternatives for check* apis	2013-04-17 23:13:34 +05:30
Mridul Muralidharan	f07961060d	Add a small note on spark.tasks.schedule.aggression	2013-04-17 23:13:02 +05:30
Mridul Muralidharan	02dffd2eb0	Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained	2013-04-17 05:52:57 +05:30
Mridul Muralidharan	a402b23bcd	Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it	2013-04-17 05:52:00 +05:30
Mridul Muralidharan	bcdde331c3	Move from master to driver	2013-04-17 04:12:18 +05:30
Mridul Muralidharan	ad80f68eb5	remove spurious debug statements	2013-04-16 22:15:34 +05:30
Mridul Muralidharan	f7969f72ee	Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example)	2013-04-16 21:51:38 +05:30
Mridul Muralidharan	323ab8ff3b	Scala does not prevent variable shadowing ! Sick error due to it ...	2013-04-16 17:05:10 +05:30
shane-huang	b493f55a4f	fix a bug in netty Block Fetcher Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-16 10:01:01 +08:00
Mridul Muralidharan	59c380d69a	Fix npe	2013-04-16 03:29:38 +05:30
Mridul Muralidharan	dd2b64ec97	Fix bug with atomic update	2013-04-16 03:19:24 +05:30
Mridul Muralidharan	5540ab8243	Use hostname instead of hostport for executor, fix creation of workdir	2013-04-16 02:57:43 +05:30
Mridul Muralidharan	eb7e95e833	Commit job to persist files	2013-04-16 02:56:36 +05:30
Matei Zaharia	a64c107449	Make ShuffledRDD.prev transient	2013-04-15 16:41:51 -04:00
Mridul Muralidharan	19652a44be	Fix issue with FileSuite failing	2013-04-15 19:16:36 +05:30
Mridul Muralidharan	54b3d45b81	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:26:50 +05:30
Mridul Muralidharan	d90d2af103	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:12:11 +05:30
Matei Zaharia	c35d530bcf	Fix compile error	2013-04-13 12:43:12 -04:00
Andrew Ash	29d3440efb	Add details when BlockManager heartbeats time out Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs Before: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats After: WARN "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms	2013-04-11 01:54:02 -03:00
Andrew xia	2f883c515f	Contiue to update codes for scala code style 1.refactor braces for "class" "if" "while" "for" "match" 2.make code lines less than 100 3.refactor class parameter and extends defination	2013-04-09 13:02:50 +08:00
Matei Zaharia	054feb6448	Fixed a bug with zip	2013-04-07 21:15:21 -04:00
Matei Zaharia	b5900d47b1	Fix compile warning	2013-04-07 20:55:42 -04:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Mridul Muralidharan	6798a09df8	Add support for building against hadoop2-yarn : adding new maven profile for it	2013-04-07 17:47:38 +05:30
shane-huang	df47b40b76	Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages change reference from io.Source to scala.io.Source to avoid looking into io.netty package Signed-off-by: shane-huang <shengsheng.huang@intel.com>	2013-04-07 14:37:12 +08:00
Andrew xia	2b373dd07a	add properties default value null to fix sbt/sbt test errors	2013-04-02 12:11:14 +08:00
Mark Hamstra	e215f67923	Correct sense of 'filter out' in comment.	2013-03-31 08:00:13 -07:00
Mark Hamstra	8bcdc64005	Fixed broken filter in getWritableClass[T]	2013-03-30 22:09:52 -07:00
Matei Zaharia	9831bc1a09	Merge pull request #539 from cgrothaus/fix-webui-workdirpath Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-29 22:16:22 -07:00
Matei Zaharia	3cc8ab6e29	Merge pull request #541 from stephenh/shufflecoalesce Add a shuffle parameter to coalesce.	2013-03-29 22:14:07 -07:00
Andrew xia	1a28f92711	change some typo and some spacing	2013-03-29 08:34:28 +08:00
Andrew xia	def3d1c84a	1.remove redundant spacing in source code 2.replace get/set functions with val and var defination	2013-03-29 08:20:35 +08:00
Holden Karau	f5df729b12	Explicitly catch all throwables (warning in 2.10)	2013-03-24 16:15:32 -07:00
Stephen Haberman	dd854d5b9f	Use Boolean in the Java API, and != for assert.	2013-03-23 11:49:45 -05:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Matei Zaharia	b8949cab88	Merge pull request #505 from stephenh/volatile Make Executor fields volatile since they're read from the thread pool.	2013-03-23 07:19:34 -07:00
Matei Zaharia	fd53f2fc7b	Merge pull request #510 from markhamstra/WithThing mapWith, flatMapWith and filterWith	2013-03-23 07:13:21 -07:00
Andrew xia	d1d9bdaabe	Just update typo and comments	2013-03-23 07:25:30 +08:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Christoph Grothaus	445f387ef4	Bugfix: WorkerWebUI must respect workDirPath from Worker	2013-03-22 11:08:40 +01:00
Matei Zaharia	35588490cb	Merge pull request #538 from rxin/cogroup Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 19:27:47 -07:00
Stephen Haberman	4f4215311a	Merge branch 'master' into volatile	2013-03-20 15:37:10 -05:00
Matei Zaharia	b812e6b7bb	Merge pull request #526 from markhamstra/foldByKey Add foldByKey	2013-03-20 11:21:02 -07:00
Reynold Xin	d48ee7e55e	Merge branch 'master' of github.com:mesos/spark into cogroup	2013-03-20 14:00:28 +08:00
Reynold Xin	00a11304fd	Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 13:49:51 +08:00
Matei Zaharia	945d1e720e	Merge pull request #536 from sasurfer/master CoalescedRDD for many partitions	2013-03-19 21:59:06 -07:00
Matei Zaharia	1cbbe94ac1	Merge pull request #534 from stephenh/removetrycatch Remove try/catch block that can't be hit.	2013-03-19 21:34:34 -07:00
Andrey Kouznetsov	bd167f83b0	call setConf from input format if it is Configurable	2013-03-19 17:15:15 +04:00
Giovanni Delussu	aceae029f7	CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD. The limitation was in the range that Scala.Int can represent.	2013-03-19 11:25:45 +01:00
Stephen Haberman	fb34967815	Remove try/catch block that can't be hit.	2013-03-18 01:55:50 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	32979b5e7d	whitespace	2013-03-16 13:36:46 -07:00
Mark Hamstra	ca9f81e8fc	refactor foldByKey to use combineByKey	2013-03-16 13:31:01 -07:00
Mark Hamstra	1fb192ef40	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-16 12:17:13 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Matei Zaharia	c1e9cdc49f	Merge pull request #525 from stephenh/subtractByKey Add PairRDDFunctions.subtractByKey.	2013-03-16 11:47:45 -07:00
Mark Hamstra	ef75be3bf7	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-15 21:41:24 -07:00
Andrew xia	5892393140	refactor fair scheduler implementation 1.Chage "pool" properties to be the memeber of ActiveJob 2.Abstract the Schedulable of Pool and TaskSetManager 3.Abstract the FIFO and FS comparator algorithm 4.Miscellaneous changing of class define and construction	2013-03-16 11:13:38 +08:00
Matei Zaharia	cdbfd1e196	Merge pull request #516 from squito/fix_local_metrics Fix local metrics	2013-03-15 15:13:28 -07:00
Mark Hamstra	1a4070477d	whitespace cleanup	2013-03-15 11:28:28 -07:00
Mark Hamstra	857010392b	Fuller implementation of foldByKey	2013-03-15 10:56:05 -07:00
Mark Hamstra	16a4ca4537	restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test	2013-03-14 13:58:37 -07:00
Mark Hamstra	b1422cbdd5	added foldByKey	2013-03-14 12:59:58 -07:00
Stephen Haberman	7786881f47	Fix tabs that snuck in.	2013-03-14 14:57:12 -05:00
Stephen Haberman	7d8bb4df3a	Allow subtractByKey's other argument to have a different value type.	2013-03-14 14:44:15 -05:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Matei Zaharia	4032beba49	Merge pull request #521 from stephenh/earlyclose Close the reader in HadoopRDD as soon as iteration end.	2013-03-13 19:29:46 -07:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	cd5b947cf6	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-13 13:16:14 -07:00
Stephen Haberman	e7f1a69c6b	Add a test for NextIterator.	2013-03-13 10:46:33 -05:00
Stephen Haberman	1a175d13b9	Add NextIterator.closeIfNeeded.	2013-03-13 10:17:39 -05:00
Stephen Haberman	8f00d23598	Remove NextIterator.close default implementation.	2013-03-12 12:30:10 -05:00
Harold Lim	0b64e5f1ac	Removed some commented code	2013-03-12 13:31:27 +08:00
Harold Lim	f5b1fecb9f	Cleaned up the code	2013-03-12 13:31:27 +08:00
Harold Lim	b5325182a3	Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies	2013-03-12 13:31:27 +08:00
Harold Lim	54ed7c4af4	Changed the name of the system property to set the allocation xml	2013-03-12 13:31:27 +08:00
Harold Lim	c07087364b	Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler	2013-03-12 13:31:27 +08:00
Stephen Haberman	9e68f48625	More quickly call close in HadoopRDD. This also refactors out the common "gotNext" iterator pattern into a shared utility class.	2013-03-11 23:59:17 -05:00
Charles Reiss	769d399674	Send block sizes as longs.	2013-03-11 14:17:05 -07:00
Mark Hamstra	562893bea3	deleted excess curly braces	2013-03-10 22:43:08 -07:00

... 3 4 5 6 7 ...

1628 commits