Commit graph

1516 commits

Author SHA1 Message Date
Reynold Xin 1450296797 SPARK-781: Log the temp directory path when Spark says "Failed to create
temp directory".
2013-06-17 16:58:23 -04:00
Gavin Li 4508089fc3 refine comments and add sc.clean 2013-06-17 05:23:46 +00:00
Gavin Li e6ae049283 Merge remote-tracking branch 'upstream1/master' into enhance_pipe 2013-06-16 22:53:39 +00:00
Gavin Li fb6d733fa8 update according to comments 2013-06-16 22:32:55 +00:00
Christopher Nguyen f91195cc15 Import just scala.math.abs rather than scala.math._ 2013-06-16 01:29:53 -07:00
Christopher Nguyen 5c886194e4 Move zero-length partition testing from JavaAPISuite.java to PartitioningSuite.scala 2013-06-16 01:23:48 -07:00
Christopher Nguyen 479442a9b9 Add zeroLengthPartitions() test to make sure, e.g., StatCounter.scala can handle empty partitions without incorrectly returning NaN 2013-06-15 17:35:55 -07:00
Matei Zaharia f961aac8b2 Merge pull request #649 from ryanlecompte/master
Add top K method to RDD using a bounded priority queue
2013-06-15 00:53:41 -07:00
ryanlecompte e8801d4490 use delegation for BoundedPriorityQueue, add Java API 2013-06-14 23:39:05 -07:00
Andrew xia 53add598f2 Update LocalSchedulerSuite to avoid using sleep for task launch 2013-06-15 01:46:13 +08:00
Reynold Xin 2cc188fd54 SPARK-774: cogroup should also disable map side combine by default 2013-06-14 00:10:54 -07:00
Reynold Xin 6738178d0d SPARK-772: groupByKey should disable map side combine. 2013-06-13 23:59:42 -07:00
ryanlecompte 93b3f5e535 drop unneeded ClassManifest implicit 2013-06-13 16:26:35 -07:00
ryanlecompte 44b8dbaede use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs 2013-06-13 16:23:15 -07:00
Mingfei 967a6a699d modify sparklister function interface according to comments 2013-06-13 14:36:07 +08:00
ryanlecompte db5bca08ff add a new top K method to RDD using a bounded priority queue 2013-06-12 10:54:16 -07:00
Patrick Wendell fd6148c8b2 Removing print statement 2013-06-10 10:27:25 -07:00
Andrew xia 190ec61799 change code style and debug info 2013-06-10 15:27:02 +08:00
Patrick Wendell ef14dc2e77 Adding Java-API version of compression codec 2013-06-09 18:09:46 -07:00
Patrick Wendell df592192e7 Monads FTW 2013-06-09 18:09:24 -07:00
Patrick Wendell 083a3485ab Clean extra whitespace 2013-06-09 11:49:33 -07:00
Patrick Wendell d1bbcebae5 Adding compression to Hadoop save functions 2013-06-09 11:39:35 -07:00
Mingfei ade822011d not check return value of eventQueue.take 2013-06-08 16:26:45 +08:00
Mingfei 4fd86e0e10 delete test code for joblogger in SparkContext 2013-06-08 15:45:47 +08:00
Mingfei 362f0f93ac Merge branch 'master' of https://github.com/mesos/spark 2013-06-08 15:20:13 +08:00
Mingfei 1a4d93c025 modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work 2013-06-08 14:23:39 +08:00
Matei Zaharia b58a29295b Small formatting and style fixes 2013-06-07 22:51:28 -07:00
Matei Zaharia c8fc423bc2 Merge pull request #631 from jerryshao/master
Fix block manager UI display issue when enable spark.cleaner.ttl
2013-06-07 22:43:18 -07:00
Matei Zaharia c9ca0a4a58 Small code style fix to SchedulingAlgorithm.scala 2013-06-07 22:40:44 -07:00
Matei Zaharia 1ae60bcb36 Merge pull request #634 from xiajunluan/master
[Spark-753] Fix ClusterSchedulSuite unit test failed
2013-06-07 22:39:06 -07:00
Gavin Li e179ff8a32 update according to comments 2013-06-05 22:41:05 +00:00
Christopher Nguyen 9d35904357 In the current code, when both partitions happen to have zero-length, the return mean will be NaN.
Consequently, the result of mean after reducing over all partitions will also be NaN,
which is not correct if there are partitions with non-zero length. This patch fixes this issue.
2013-06-04 22:12:47 -07:00
Matei Zaharia fff3728552 Merge pull request #640 from pwendell/timeout-update
Fixing bug in BlockManager timeout
2013-06-04 16:09:50 -07:00
Patrick Wendell 061fd3ae36 Fixing bug in BlockManager timeout 2013-06-04 19:02:44 -04:00
Matei Zaharia f420d4f228 Merge pull request #639 from pwendell/timeout-update
Bump akka and blockmanager timeouts to 60 seconds
2013-06-04 15:25:58 -07:00
Patrick Wendell 8bd4e12104 Bump akka and blockmanager timeouts to 60 seconds 2013-06-04 18:14:24 -04:00
Andrew xia 606bb1b450 Fix schedulingAlgorithm bugs for unit test 2013-06-03 10:29:23 +08:00
Gavin Li 4a9913d66a add ut for pipe enhancement 2013-06-02 23:21:09 +00:00
Gavin Li 9f84315c05 enhance pipe to support what we can do in hadoop streaming 2013-06-01 00:26:10 +00:00
Reynold Xin de1167bf2c Incorporated Charles' feedback to put rdd metadata removal in
BlockManagerMasterActor.
2013-05-31 15:54:57 -07:00
Reynold Xin ba5e544461 More block manager cleanup.
Implemented a removeRdd method in BlockManager, and use that to
implement RDD.unpersist. Previously, unpersist needs to send B akka
messages, where B = number of blocks. Now unpersist only needs to send W
akka messages, where W = the number of workers.
2013-05-31 01:48:16 -07:00
jerryshao 926f41cc52 fix block manager UI display issue when enable spark.cleaner.ttl 2013-05-31 09:32:52 +08:00
Reynold Xin f6ad3781b1 Fixed the flaky unpersist test in RDDSuite. 2013-05-30 16:28:08 -07:00
Reynold Xin bed1b08169 Do not create symlink for local add file. Instead, copy the file.
This prevents Spark from changing the original file's permission, and
also allow add file to work on non-posix operating systems.
2013-05-30 16:21:49 -07:00
Shivaram Venkataraman 3b0cd17343 Merge branch 'master' of git://github.com/mesos/spark
Conflicts:
	core/src/test/scala/spark/ShuffleSuite.scala
2013-05-30 14:36:24 -07:00
Andrew xia c3db3ea554 1. Add unit test for local scheduler
2. Move localTaskSetManager to a new file
2013-05-30 20:49:40 +08:00
Andrew xia ecceb101d3 implement FIFO and fair scheduler for spark local mode 2013-05-30 10:43:01 +08:00
Shivaram Venkataraman 19fd6d54c0 Also flush serializer in revertPartialWrites 2013-05-29 17:29:34 -07:00
Shivaram Venkataraman 618c8cae1e Skip fetching zero-sized blocks in OIO.
Also unify splitLocalRemoteBlocks for netty/nio and add a test case
2013-05-29 13:18:54 -07:00
Matei Zaharia 6ed71390d9 Merge pull request #626 from stephenh/remove-add-if-no-port
Remove unused addIfNoPort.
2013-05-29 10:14:22 -07:00
Shivaram Venkataraman b79b10a6d6 Flush serializer to fix zero-size kryo blocks bug.
Also convert the local-cluster test case to check for non-zero block sizes
2013-05-29 00:52:55 -07:00
Matei Zaharia 41d230ccb0 Merge pull request #611 from squito/classloader
Use default classloaders for akka & deserializing task results
2013-05-28 23:35:24 -07:00
Shivaram Venkataraman fbc1ab3468 Couple of Netty fixes
a. Fix the port number by reading it from the bound channel
b. Fix the shutdown sequence to make sure we actually block on the channel
c. Fix the unit test to use two JVMs.
2013-05-28 16:27:16 -07:00
Stephen Haberman 4fe1fbdd51 Remove unused addIfNoPort. 2013-05-28 16:26:32 -05:00
Matei Zaharia 3db1e17baa Merge pull request #620 from jerryshao/master
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia e8d4b6c296 Merge pull request #529 from xiajunluan/master
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin 6bbbe01287 Fixed a stupid mistake that NonJavaSerializableClass was made Java
serializable.
2013-05-24 16:51:45 -07:00
Reynold Xin 26962c9340 Automatically configure Netty port. This makes unit tests using
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.

Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin 6ea085169d Fixed the bug that shuffle serializer is ignored by the new shuffle
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao bd3ea8f2a6 fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException 2013-05-24 14:26:19 +08:00
Matei Zaharia a2b0a7975c Merge pull request #619 from woggling/adjust-sampling
Use ARRAY_SAMPLE_SIZE constant instead of hard-coded 100.0 in SizeEstimator
2013-05-21 18:16:20 -07:00
Charles Reiss f350f14084 Use ARRAY_SAMPLE_SIZE constant instead of 100.0 2013-05-21 18:11:33 -07:00
Charles Reiss 786c97b87c DistributedSuite: remove dead test code 2013-05-21 11:35:49 -07:00
Andrew xia ecd6d75c6a fix bug of unit tests 2013-05-21 06:49:23 +08:00
Reynold Xin 5912cc4967 Merge pull request #610 from JoshRosen/spark-747
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin 8d78c5f89f Changed the logging level from info to warning when addJar(null) is
called.
2013-05-17 18:51:35 -07:00
Reynold Xin 6729c2ead8 Merge branch 'master' of github.com:mesos/spark 2013-05-17 17:58:06 -07:00
Andrew xia 3d4672eaa9 Merge branch 'master' into xiajunluan
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia d19753b9c7 expose TaskSetManager type to resourceOffer function in ClusterScheduler 2013-05-18 06:45:19 +08:00
Reynold Xin 61cf176238 Added dependency on netty-all in Maven. 2013-05-16 14:31:26 -07:00
Andrew xia c6e2770bfe Fix ClusterScheduler bug to avoid allocating tasks to same slave 2013-05-17 05:10:38 +08:00
Mridul Muralidharan f0881f8d48 Hope this does not turn into a bike shed change 2013-05-17 01:58:50 +05:30
Mridul Muralidharan feddd2530d Filter out nulls - prevent NPE 2013-05-16 17:49:14 +05:30
Josh Rosen b8e46b6074 Abort job if result exceeds Akka frame size; add test. 2013-05-16 01:57:57 -07:00
Matei Zaharia 2f576aba8f Merge pull request #602 from rxin/shufflemerge
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin 203d7b7c14 Merge pull request #593 from squito/driver_ui_link
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin f3491cb89b Merge branch 'master' of github.com:mesos/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/test/scala/spark/DistributedSuite.scala
	project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin f9d40a5848 Added a comment in JdbcRDD for example usage. 2013-05-14 23:29:57 -07:00
Reynold Xin 404f9ff617 Added derby dependency to Maven pom files for the JDBC Java test. 2013-05-14 23:28:34 -07:00
Reynold Xin 81ad2fa331 Merge branch 'jdbc' of github.com:koeninger/spark
Conflicts:
	project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid 38d4b97c6d use threads classloader when deserializing task results; classnotfoundexception includes classloader 2013-05-14 22:32:14 -07:00
Imran Rashid d7d1da79d3 when akka starts, use akkas default classloader (current thread) 2013-05-14 22:32:09 -07:00
Cody Koeninger b16c4896f6 add test for JdbcRDD using embedded derby, per rxin suggestion 2013-05-14 23:44:04 -05:00
Matei Zaharia 016ac86830 Merge pull request #601 from rxin/emptyrdd-master
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia 4b354e0a08 Merge pull request #589 from mridulm/master
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell 7f0833647b Capturing class name 2013-05-12 07:54:03 -07:00
Patrick Wendell 72b9c4cb6e Small fix 2013-05-11 23:53:50 -07:00
Patrick Wendell 1c15b85051 Removing import 2013-05-11 23:52:53 -07:00
Patrick Wendell 059ab88754 Changing technique to use same code path in all cases 2013-05-11 23:50:54 -07:00
Cody Koeninger 3da2305ed0 code cleanup per rxin comments 2013-05-11 23:59:07 -05:00
Josh Rosen 440719109e Throw exception if task result exceeds Akka frame size.
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell a5c28bb888 Removing unnecessary map 2013-05-11 14:20:39 -07:00
Patrick Wendell 0345954530 SPARK-738: Spark should detect and squash nonserializable exceptions 2013-05-11 14:17:09 -07:00
Mark Hamstra 6e6b3e0d7e Actually use the cleaned closure in foreachPartition 2013-05-10 13:02:34 -07:00
Mridul Muralidharan b05c9d22d7 Remove explicit hardcoding of yarn-standalone as args(0) if it is missing. 2013-05-09 18:49:12 +05:30
Imran Rashid 0ab818d508 fix linebreak 2013-05-09 00:38:59 -07:00
Reynold Xin 9cafacf32d Added test for Netty suite. 2013-05-07 22:42:37 -07:00
Reynold Xin 5d70ee4663 Cleaned up connection manager (moved many classes to their own files). 2013-05-07 22:42:15 -07:00
Reynold Xin 8388e8dd7a Minor style fix in DiskStore... 2013-05-07 18:40:35 -07:00
Reynold Xin 547dcbe494 Cleaned up Scala files in network/netty from Shane's PR. 2013-05-07 18:39:33 -07:00