Commit graph

1490 commits

Author SHA1 Message Date
Matei Zaharia 3e61beff7b Merge pull request #648 from shivaram/netty-dbg
Shuffle fixes and cleanup
2013-06-22 16:22:47 -07:00
Patrick Wendell 7e9f1ed0de Some cleanup of styling 2013-06-22 10:31:37 -07:00
Patrick Wendell 3b7ebdeeb8 Handling entirely failed stages 2013-06-22 10:31:37 -07:00
Patrick Wendell be6107ce44 Some tweaking with shared page header 2013-06-22 10:31:37 -07:00
Patrick Wendell 9a24d1a2d0 Using scala in XML imports 2013-06-22 10:31:37 -07:00
Patrick Wendell f91e1c4822 Linking RDD information when available in stages 2013-06-22 10:31:37 -07:00
Patrick Wendell a86bb459e2 Showing shuffle status and purging old stages 2013-06-22 10:31:37 -07:00
Patrick Wendell 3485e73376 Style cleanup 2013-06-22 10:31:37 -07:00
Patrick Wendell dd696f3a3d Some renaming and comments 2013-06-22 10:31:37 -07:00
Patrick Wendell 5c872e9ef5 Documentation and some refactoring 2013-06-22 10:31:37 -07:00
Patrick Wendell 17776323a6 More work on percentile data: 2013-06-22 10:31:37 -07:00
Patrick Wendell dcf6a68177 Refactoring into different modules 2013-06-22 10:31:36 -07:00
Patrick Wendell ce81c320ac Adding helper function to make listing tables 2013-06-22 10:31:36 -07:00
Patrick Wendell 9fd5dc3ea9 Initial steps towards job progress UI 2013-06-22 10:31:36 -07:00
Patrick Wendell bc4a811c57 Stash 2013-06-22 10:31:36 -07:00
Patrick Wendell 77c53f7868 Refactoring UI packages 2013-06-22 10:31:36 -07:00
Patrick Wendell 8b5c7e71c4 Import cleanup 2013-06-22 10:31:36 -07:00
Patrick Wendell 32a45d01b1 Removing twirl files 2013-06-22 10:31:36 -07:00
Patrick Wendell 4e1f202481 Removing dead code 2013-06-22 10:31:36 -07:00
Patrick Wendell d6fde4ffe4 Some JSON cleanup 2013-06-22 10:31:36 -07:00
Patrick Wendell 91ec5a1a04 Changing JSON protocol and removing spray code 2013-06-22 10:31:36 -07:00
Patrick Wendell fc94576ece Adding worker version of UI 2013-06-22 10:31:36 -07:00
Patrick Wendell ee73c09ac9 Some comments 2013-06-22 10:31:36 -07:00
Patrick Wendell 9161db5478 Cleaning up master web UI 2013-06-22 10:31:36 -07:00
Patrick Wendell e55cf0245f Adding WebUI file 2013-06-22 10:31:35 -07:00
Patrick Wendell f85fd7a793 Commenting unfinished part 2013-06-22 10:31:35 -07:00
Patrick Wendell 2c36a514aa Spray refactoring for master web UI 2013-06-22 10:31:35 -07:00
Patrick Wendell 7e6977b6c5 Fix in storage status page 2013-06-22 10:31:35 -07:00
Patrick Wendell 950f83535a Adding deterministic port 2013-06-22 10:31:35 -07:00
Patrick Wendell 7cd70dc2c1 Minor cleanup 2013-06-22 10:31:35 -07:00
Patrick Wendell e66f570194 Completely hacked version of block manager UI in jetty 2013-06-22 10:31:35 -07:00
Patrick Wendell 60fbf7e461 Partially working checkpoint 2013-06-22 10:31:35 -07:00
Matei Zaharia 1ef5d0d2c9 Merge pull request #644 from shimingfei/joblogger
add Joblogger to Spark (on new Spark code)
2013-06-22 09:35:57 -07:00
Jey Kottalam 1ba3c17303 use parens when calling method with side-effects 2013-06-21 12:14:16 -04:00
Jey Kottalam edb18ca928 Rename PythonWorker to PythonWorkerFactory 2013-06-21 12:14:16 -04:00
Jey Kottalam 62c4781400 Add tests and fixes for Python daemon shutdown 2013-06-21 12:14:16 -04:00
Jey Kottalam c79a6078c3 Prefork Python worker processes 2013-06-21 12:14:16 -04:00
Jey Kottalam 40afe0d2a5 Add Python timing instrumentation 2013-06-21 12:14:16 -04:00
Mingfei 2fc794a6c7 small modify in DAGScheduler 2013-06-21 18:21:35 +08:00
Mingfei 4b9862ac9c small format modification 2013-06-21 17:55:32 +08:00
Mingfei aa7aa587be some format modification 2013-06-21 17:48:41 +08:00
Mingfei 5240795154 edit according to comments 2013-06-21 17:38:23 +08:00
Matei Zaharia 71030ba3eb Merge pull request #654 from lyogavin/enhance_pipe
fix typo and coding style in #638
2013-06-19 15:21:03 -07:00
Thomas Graves bad51c7cb4 upmerge with latest mesos/spark master and fix hbase compile with hadoop2-yarn profile 2013-06-19 14:39:13 -05:00
Thomas Graves 75d78c7ac9 Add support for Spark on Yarn on a secure Hadoop cluster 2013-06-19 11:18:42 -05:00
Matei Zaharia 7902baddc7 Update ASM to version 4.0 2013-06-19 13:34:30 +02:00
Gavin Li 0a2a9bce1e fix typo and coding style 2013-06-18 21:30:13 +00:00
jerryshao 1e9269c3ee reduce ZippedPartitionsRDD's getPreferredLocations complexity 2013-06-18 09:49:06 +08:00
Matei Zaharia db42451a52 Merge pull request #643 from adatao/master
Bug fix: Zero-length partitions result in NaN for overall mean & variance
2013-06-17 15:26:36 -07:00
Matei Zaharia e82a2ffcc9 Merge pull request #653 from rxin/logging
SPARK-781: Log the temp directory path when Spark says "Failed to create temp directory."
2013-06-17 15:13:15 -07:00
Matei Zaharia ec193c7d89 Merge remote-tracking branch 'xiajunluan/xiajunluan'
Conflicts:
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-06-18 00:11:50 +02:00
Reynold Xin be3c406edf Fixed the typo pointed out by Matei. 2013-06-17 17:07:51 -04:00
Reynold Xin 1450296797 SPARK-781: Log the temp directory path when Spark says "Failed to create
temp directory".
2013-06-17 16:58:23 -04:00
Gavin Li 4508089fc3 refine comments and add sc.clean 2013-06-17 05:23:46 +00:00
Gavin Li e6ae049283 Merge remote-tracking branch 'upstream1/master' into enhance_pipe 2013-06-16 22:53:39 +00:00
Gavin Li fb6d733fa8 update according to comments 2013-06-16 22:32:55 +00:00
Matei Zaharia f961aac8b2 Merge pull request #649 from ryanlecompte/master
Add top K method to RDD using a bounded priority queue
2013-06-15 00:53:41 -07:00
ryanlecompte e8801d4490 use delegation for BoundedPriorityQueue, add Java API 2013-06-14 23:39:05 -07:00
Reynold Xin 2cc188fd54 SPARK-774: cogroup should also disable map side combine by default 2013-06-14 00:10:54 -07:00
Reynold Xin 6738178d0d SPARK-772: groupByKey should disable map side combine. 2013-06-13 23:59:42 -07:00
ryanlecompte 93b3f5e535 drop unneeded ClassManifest implicit 2013-06-13 16:26:35 -07:00
ryanlecompte 44b8dbaede use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs 2013-06-13 16:23:15 -07:00
Shivaram Venkataraman 1d9f0df065 Fix some comments and style 2013-06-13 14:46:25 -07:00
Mingfei 967a6a699d modify sparklister function interface according to comments 2013-06-13 14:36:07 +08:00
Shivaram Venkataraman 5da4287b1d Merge branch 'netty-dbg' of github.com:shivaram/spark into netty-dbg 2013-06-12 16:38:37 -07:00
Shivaram Venkataraman 5e9a9317c5 Merge branch 'master' of git://github.com/mesos/spark into netty-dbg 2013-06-12 16:38:01 -07:00
ryanlecompte db5bca08ff add a new top K method to RDD using a bounded priority queue 2013-06-12 10:54:16 -07:00
Andrew xia 190ec61799 change code style and debug info 2013-06-10 15:27:02 +08:00
Patrick Wendell ef14dc2e77 Adding Java-API version of compression codec 2013-06-09 18:09:46 -07:00
Patrick Wendell df592192e7 Monads FTW 2013-06-09 18:09:24 -07:00
Patrick Wendell d1bbcebae5 Adding compression to Hadoop save functions 2013-06-09 11:39:35 -07:00
Mingfei ade822011d not check return value of eventQueue.take 2013-06-08 16:26:45 +08:00
Mingfei 4fd86e0e10 delete test code for joblogger in SparkContext 2013-06-08 15:45:47 +08:00
Mingfei 362f0f93ac Merge branch 'master' of https://github.com/mesos/spark 2013-06-08 15:20:13 +08:00
Mingfei 1a4d93c025 modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work 2013-06-08 14:23:39 +08:00
Matei Zaharia b58a29295b Small formatting and style fixes 2013-06-07 22:51:28 -07:00
Matei Zaharia c8fc423bc2 Merge pull request #631 from jerryshao/master
Fix block manager UI display issue when enable spark.cleaner.ttl
2013-06-07 22:43:18 -07:00
Matei Zaharia c9ca0a4a58 Small code style fix to SchedulingAlgorithm.scala 2013-06-07 22:40:44 -07:00
Matei Zaharia 1ae60bcb36 Merge pull request #634 from xiajunluan/master
[Spark-753] Fix ClusterSchedulSuite unit test failed
2013-06-07 22:39:06 -07:00
Shivaram Venkataraman ac480fd977 Clean up variables and counters in BlockFetcherIterator 2013-06-06 16:34:27 -07:00
Gavin Li e179ff8a32 update according to comments 2013-06-05 22:41:05 +00:00
Shivaram Venkataraman cb2f5046ee Pass in bufferSize to BufferedOutputStream 2013-06-05 15:09:02 -07:00
Shivaram Venkataraman c851957fe4 Don't write zero block files with java serializer 2013-06-05 14:28:38 -07:00
Christopher Nguyen 9d35904357 In the current code, when both partitions happen to have zero-length, the return mean will be NaN.
Consequently, the result of mean after reducing over all partitions will also be NaN,
which is not correct if there are partitions with non-zero length. This patch fixes this issue.
2013-06-04 22:12:47 -07:00
Matei Zaharia fff3728552 Merge pull request #640 from pwendell/timeout-update
Fixing bug in BlockManager timeout
2013-06-04 16:09:50 -07:00
Patrick Wendell 061fd3ae36 Fixing bug in BlockManager timeout 2013-06-04 19:02:44 -04:00
Matei Zaharia f420d4f228 Merge pull request #639 from pwendell/timeout-update
Bump akka and blockmanager timeouts to 60 seconds
2013-06-04 15:25:58 -07:00
Patrick Wendell 8bd4e12104 Bump akka and blockmanager timeouts to 60 seconds 2013-06-04 18:14:24 -04:00
Shivaram Venkataraman 96943a1cc0 var to val 2013-06-03 12:29:38 -07:00
Shivaram Venkataraman cd347f547a Reuse the file object as it is valid after delete 2013-06-03 12:27:51 -07:00
Shivaram Venkataraman a058b0acf3 Delete a file for a block if it already exists. 2013-06-03 12:10:00 -07:00
Andrew xia 606bb1b450 Fix schedulingAlgorithm bugs for unit test 2013-06-03 10:29:23 +08:00
Shivaram Venkataraman 038cfc1a9a Make connect timeout configurable 2013-05-31 23:32:18 -07:00
Shivaram Venkataraman 91aca92249 Another round of Netty fixes.
1. Avoid race condition between stop and copier completion
2. Handle socket exceptions by reporting them and filling in a failed
FetchResult
2013-05-31 23:21:38 -07:00
Gavin Li 9f84315c05 enhance pipe to support what we can do in hadoop streaming 2013-06-01 00:26:10 +00:00
Reynold Xin de1167bf2c Incorporated Charles' feedback to put rdd metadata removal in
BlockManagerMasterActor.
2013-05-31 15:54:57 -07:00
Reynold Xin ba5e544461 More block manager cleanup.
Implemented a removeRdd method in BlockManager, and use that to
implement RDD.unpersist. Previously, unpersist needs to send B akka
messages, where B = number of blocks. Now unpersist only needs to send W
akka messages, where W = the number of workers.
2013-05-31 01:48:16 -07:00
jerryshao 926f41cc52 fix block manager UI display issue when enable spark.cleaner.ttl 2013-05-31 09:32:52 +08:00
Reynold Xin bed1b08169 Do not create symlink for local add file. Instead, copy the file.
This prevents Spark from changing the original file's permission, and
also allow add file to work on non-posix operating systems.
2013-05-30 16:21:49 -07:00
Shivaram Venkataraman 3b0cd17343 Merge branch 'master' of git://github.com/mesos/spark
Conflicts:
	core/src/test/scala/spark/ShuffleSuite.scala
2013-05-30 14:36:24 -07:00
Andrew xia c3db3ea554 1. Add unit test for local scheduler
2. Move localTaskSetManager to a new file
2013-05-30 20:49:40 +08:00
Andrew xia ecceb101d3 implement FIFO and fair scheduler for spark local mode 2013-05-30 10:43:01 +08:00
Shivaram Venkataraman 19fd6d54c0 Also flush serializer in revertPartialWrites 2013-05-29 17:29:34 -07:00
Shivaram Venkataraman 618c8cae1e Skip fetching zero-sized blocks in OIO.
Also unify splitLocalRemoteBlocks for netty/nio and add a test case
2013-05-29 13:18:54 -07:00
Matei Zaharia 6ed71390d9 Merge pull request #626 from stephenh/remove-add-if-no-port
Remove unused addIfNoPort.
2013-05-29 10:14:22 -07:00
Shivaram Venkataraman b79b10a6d6 Flush serializer to fix zero-size kryo blocks bug.
Also convert the local-cluster test case to check for non-zero block sizes
2013-05-29 00:52:55 -07:00
Matei Zaharia 41d230ccb0 Merge pull request #611 from squito/classloader
Use default classloaders for akka & deserializing task results
2013-05-28 23:35:24 -07:00
Shivaram Venkataraman fbc1ab3468 Couple of Netty fixes
a. Fix the port number by reading it from the bound channel
b. Fix the shutdown sequence to make sure we actually block on the channel
c. Fix the unit test to use two JVMs.
2013-05-28 16:27:16 -07:00
Stephen Haberman 4fe1fbdd51 Remove unused addIfNoPort. 2013-05-28 16:26:32 -05:00
Matei Zaharia 3db1e17baa Merge pull request #620 from jerryshao/master
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia e8d4b6c296 Merge pull request #529 from xiajunluan/master
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin 26962c9340 Automatically configure Netty port. This makes unit tests using
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.

Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin 6ea085169d Fixed the bug that shuffle serializer is ignored by the new shuffle
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao bd3ea8f2a6 fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException 2013-05-24 14:26:19 +08:00
Charles Reiss f350f14084 Use ARRAY_SAMPLE_SIZE constant instead of 100.0 2013-05-21 18:11:33 -07:00
Andrew xia ecd6d75c6a fix bug of unit tests 2013-05-21 06:49:23 +08:00
Reynold Xin 5912cc4967 Merge pull request #610 from JoshRosen/spark-747
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin 8d78c5f89f Changed the logging level from info to warning when addJar(null) is
called.
2013-05-17 18:51:35 -07:00
Andrew xia 3d4672eaa9 Merge branch 'master' into xiajunluan
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia d19753b9c7 expose TaskSetManager type to resourceOffer function in ClusterScheduler 2013-05-18 06:45:19 +08:00
Andrew xia c6e2770bfe Fix ClusterScheduler bug to avoid allocating tasks to same slave 2013-05-17 05:10:38 +08:00
Mridul Muralidharan f0881f8d48 Hope this does not turn into a bike shed change 2013-05-17 01:58:50 +05:30
Mridul Muralidharan feddd2530d Filter out nulls - prevent NPE 2013-05-16 17:49:14 +05:30
Josh Rosen b8e46b6074 Abort job if result exceeds Akka frame size; add test. 2013-05-16 01:57:57 -07:00
Matei Zaharia 2f576aba8f Merge pull request #602 from rxin/shufflemerge
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin 203d7b7c14 Merge pull request #593 from squito/driver_ui_link
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin f3491cb89b Merge branch 'master' of github.com:mesos/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/test/scala/spark/DistributedSuite.scala
	project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin f9d40a5848 Added a comment in JdbcRDD for example usage. 2013-05-14 23:29:57 -07:00
Reynold Xin 81ad2fa331 Merge branch 'jdbc' of github.com:koeninger/spark
Conflicts:
	project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid 38d4b97c6d use threads classloader when deserializing task results; classnotfoundexception includes classloader 2013-05-14 22:32:14 -07:00
Imran Rashid d7d1da79d3 when akka starts, use akkas default classloader (current thread) 2013-05-14 22:32:09 -07:00
Matei Zaharia 016ac86830 Merge pull request #601 from rxin/emptyrdd-master
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia 4b354e0a08 Merge pull request #589 from mridulm/master
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell 7f0833647b Capturing class name 2013-05-12 07:54:03 -07:00
Patrick Wendell 72b9c4cb6e Small fix 2013-05-11 23:53:50 -07:00
Patrick Wendell 1c15b85051 Removing import 2013-05-11 23:52:53 -07:00
Patrick Wendell 059ab88754 Changing technique to use same code path in all cases 2013-05-11 23:50:54 -07:00
Cody Koeninger 3da2305ed0 code cleanup per rxin comments 2013-05-11 23:59:07 -05:00
Josh Rosen 440719109e Throw exception if task result exceeds Akka frame size.
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell 0345954530 SPARK-738: Spark should detect and squash nonserializable exceptions 2013-05-11 14:17:09 -07:00
Mark Hamstra 6e6b3e0d7e Actually use the cleaned closure in foreachPartition 2013-05-10 13:02:34 -07:00
Imran Rashid 0ab818d508 fix linebreak 2013-05-09 00:38:59 -07:00
Reynold Xin 5d70ee4663 Cleaned up connection manager (moved many classes to their own files). 2013-05-07 22:42:15 -07:00
Reynold Xin 8388e8dd7a Minor style fix in DiskStore... 2013-05-07 18:40:35 -07:00
Reynold Xin 547dcbe494 Cleaned up Scala files in network/netty from Shane's PR. 2013-05-07 18:39:33 -07:00
Reynold Xin 9e64396ca4 Cleaned up the Java files from Shane's PR. 2013-05-07 18:30:54 -07:00
Reynold Xin 0e5cc30868 Cleaned up BlockManager and BlockFetcherIterator from Shane's PR. 2013-05-07 18:18:24 -07:00
Reynold Xin 8b79485171 Moved BlockFetcherIterator to its own file. 2013-05-07 17:02:32 -07:00
Reynold Xin 90577ada69 Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/DiskStore.scala
	project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Reynold Xin 0fd84965f6 Added EmptyRDD. 2013-05-06 15:40:34 -07:00