Commit graph

2030 commits

Author SHA1 Message Date
Gavin Li 9f84315c05 enhance pipe to support what we can do in hadoop streaming 2013-06-01 00:26:10 +00:00
Reynold Xin de1167bf2c Incorporated Charles' feedback to put rdd metadata removal in
BlockManagerMasterActor.
2013-05-31 15:54:57 -07:00
Reynold Xin ba5e544461 More block manager cleanup.
Implemented a removeRdd method in BlockManager, and use that to
implement RDD.unpersist. Previously, unpersist needs to send B akka
messages, where B = number of blocks. Now unpersist only needs to send W
akka messages, where W = the number of workers.
2013-05-31 01:48:16 -07:00
jerryshao 926f41cc52 fix block manager UI display issue when enable spark.cleaner.ttl 2013-05-31 09:32:52 +08:00
Reynold Xin f6ad3781b1 Fixed the flaky unpersist test in RDDSuite. 2013-05-30 16:28:08 -07:00
Reynold Xin bed1b08169 Do not create symlink for local add file. Instead, copy the file.
This prevents Spark from changing the original file's permission, and
also allow add file to work on non-posix operating systems.
2013-05-30 16:21:49 -07:00
Shivaram Venkataraman 3b0cd17343 Merge branch 'master' of git://github.com/mesos/spark
Conflicts:
	core/src/test/scala/spark/ShuffleSuite.scala
2013-05-30 14:36:24 -07:00
Andrew xia c3db3ea554 1. Add unit test for local scheduler
2. Move localTaskSetManager to a new file
2013-05-30 20:49:40 +08:00
Andrew xia ecceb101d3 implement FIFO and fair scheduler for spark local mode 2013-05-30 10:43:01 +08:00
Shivaram Venkataraman 19fd6d54c0 Also flush serializer in revertPartialWrites 2013-05-29 17:29:34 -07:00
Shivaram Venkataraman 618c8cae1e Skip fetching zero-sized blocks in OIO.
Also unify splitLocalRemoteBlocks for netty/nio and add a test case
2013-05-29 13:18:54 -07:00
Matei Zaharia 6ed71390d9 Merge pull request #626 from stephenh/remove-add-if-no-port
Remove unused addIfNoPort.
2013-05-29 10:14:22 -07:00
Shivaram Venkataraman b79b10a6d6 Flush serializer to fix zero-size kryo blocks bug.
Also convert the local-cluster test case to check for non-zero block sizes
2013-05-29 00:52:55 -07:00
Matei Zaharia 41d230ccb0 Merge pull request #611 from squito/classloader
Use default classloaders for akka & deserializing task results
2013-05-28 23:35:24 -07:00
Shivaram Venkataraman fbc1ab3468 Couple of Netty fixes
a. Fix the port number by reading it from the bound channel
b. Fix the shutdown sequence to make sure we actually block on the channel
c. Fix the unit test to use two JVMs.
2013-05-28 16:27:16 -07:00
Stephen Haberman 4fe1fbdd51 Remove unused addIfNoPort. 2013-05-28 16:26:32 -05:00
Matei Zaharia 3db1e17baa Merge pull request #620 from jerryshao/master
Fix CheckpointRDD java.io.FileNotFoundException when calling getPreferredLocations
2013-05-27 21:31:43 -07:00
Matei Zaharia e8d4b6c296 Merge pull request #529 from xiajunluan/master
[SPARK-663]Implement Fair Scheduler in Spark Cluster Scheduler
2013-05-25 21:09:03 -07:00
Reynold Xin 6bbbe01287 Fixed a stupid mistake that NonJavaSerializableClass was made Java
serializable.
2013-05-24 16:51:45 -07:00
Reynold Xin 26962c9340 Automatically configure Netty port. This makes unit tests using
local-cluster pass. Previously they were failing because Netty was
trying to bind to the same port for all processes.

Pair programmed with @shivaram.
2013-05-24 16:39:33 -07:00
Reynold Xin 6ea085169d Fixed the bug that shuffle serializer is ignored by the new shuffle
block iterators for local blocks. Also added a unit test for that.
2013-05-24 14:08:37 -07:00
jerryshao bd3ea8f2a6 fix CheckpointRDD getPreferredLocations java.io.FileNotFoundException 2013-05-24 14:26:19 +08:00
Matei Zaharia a2b0a7975c Merge pull request #619 from woggling/adjust-sampling
Use ARRAY_SAMPLE_SIZE constant instead of hard-coded 100.0 in SizeEstimator
2013-05-21 18:16:20 -07:00
Charles Reiss f350f14084 Use ARRAY_SAMPLE_SIZE constant instead of 100.0 2013-05-21 18:11:33 -07:00
Charles Reiss 786c97b87c DistributedSuite: remove dead test code 2013-05-21 11:35:49 -07:00
Andrew xia ecd6d75c6a fix bug of unit tests 2013-05-21 06:49:23 +08:00
Reynold Xin 5912cc4967 Merge pull request #610 from JoshRosen/spark-747
Throw exception if TaskResult exceeds Akka frame size
2013-05-17 19:58:40 -07:00
Reynold Xin 8d78c5f89f Changed the logging level from info to warning when addJar(null) is
called.
2013-05-17 18:51:35 -07:00
Reynold Xin 6729c2ead8 Merge branch 'master' of github.com:mesos/spark 2013-05-17 17:58:06 -07:00
Andrew xia 3d4672eaa9 Merge branch 'master' into xiajunluan
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/scheduler/cluster/ClusterScheduler.scala
	core/src/main/scala/spark/scheduler/cluster/TaskSetManager.scala
2013-05-18 07:28:03 +08:00
Andrew xia d19753b9c7 expose TaskSetManager type to resourceOffer function in ClusterScheduler 2013-05-18 06:45:19 +08:00
Reynold Xin 61cf176238 Added dependency on netty-all in Maven. 2013-05-16 14:31:26 -07:00
Andrew xia c6e2770bfe Fix ClusterScheduler bug to avoid allocating tasks to same slave 2013-05-17 05:10:38 +08:00
Mridul Muralidharan f0881f8d48 Hope this does not turn into a bike shed change 2013-05-17 01:58:50 +05:30
Mridul Muralidharan feddd2530d Filter out nulls - prevent NPE 2013-05-16 17:49:14 +05:30
Josh Rosen b8e46b6074 Abort job if result exceeds Akka frame size; add test. 2013-05-16 01:57:57 -07:00
Matei Zaharia 2f576aba8f Merge pull request #602 from rxin/shufflemerge
Manual merge & cleanup of Shane's Shuffle Performance Optimization
2013-05-15 18:06:24 -07:00
Reynold Xin 203d7b7c14 Merge pull request #593 from squito/driver_ui_link
Master UI has link to Application UI
2013-05-15 00:47:20 -07:00
Reynold Xin f3491cb89b Merge branch 'master' of github.com:mesos/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/test/scala/spark/DistributedSuite.scala
	project/SparkBuild.scala
2013-05-15 00:31:52 -07:00
Reynold Xin f9d40a5848 Added a comment in JdbcRDD for example usage. 2013-05-14 23:29:57 -07:00
Reynold Xin 404f9ff617 Added derby dependency to Maven pom files for the JDBC Java test. 2013-05-14 23:28:34 -07:00
Reynold Xin 81ad2fa331 Merge branch 'jdbc' of github.com:koeninger/spark
Conflicts:
	project/SparkBuild.scala
2013-05-14 23:12:00 -07:00
Imran Rashid 38d4b97c6d use threads classloader when deserializing task results; classnotfoundexception includes classloader 2013-05-14 22:32:14 -07:00
Imran Rashid d7d1da79d3 when akka starts, use akkas default classloader (current thread) 2013-05-14 22:32:09 -07:00
Cody Koeninger b16c4896f6 add test for JdbcRDD using embedded derby, per rxin suggestion 2013-05-14 23:44:04 -05:00
Matei Zaharia 016ac86830 Merge pull request #601 from rxin/emptyrdd-master
EmptyRDD (master branch 0.8)
2013-05-13 21:45:36 -07:00
Matei Zaharia 4b354e0a08 Merge pull request #589 from mridulm/master
Add support for instance local scheduling
2013-05-13 17:39:19 -07:00
Patrick Wendell 7f0833647b Capturing class name 2013-05-12 07:54:03 -07:00
Patrick Wendell 72b9c4cb6e Small fix 2013-05-11 23:53:50 -07:00
Patrick Wendell 1c15b85051 Removing import 2013-05-11 23:52:53 -07:00
Patrick Wendell 059ab88754 Changing technique to use same code path in all cases 2013-05-11 23:50:54 -07:00
Cody Koeninger 3da2305ed0 code cleanup per rxin comments 2013-05-11 23:59:07 -05:00
Josh Rosen 440719109e Throw exception if task result exceeds Akka frame size.
This partially addresses SPARK-747.
2013-05-11 19:17:13 -07:00
Patrick Wendell a5c28bb888 Removing unnecessary map 2013-05-11 14:20:39 -07:00
Patrick Wendell 0345954530 SPARK-738: Spark should detect and squash nonserializable exceptions 2013-05-11 14:17:09 -07:00
Mark Hamstra 6e6b3e0d7e Actually use the cleaned closure in foreachPartition 2013-05-10 13:02:34 -07:00
Mridul Muralidharan b05c9d22d7 Remove explicit hardcoding of yarn-standalone as args(0) if it is missing. 2013-05-09 18:49:12 +05:30
Imran Rashid 0ab818d508 fix linebreak 2013-05-09 00:38:59 -07:00
Reynold Xin 9cafacf32d Added test for Netty suite. 2013-05-07 22:42:37 -07:00
Reynold Xin 5d70ee4663 Cleaned up connection manager (moved many classes to their own files). 2013-05-07 22:42:15 -07:00
Reynold Xin 8388e8dd7a Minor style fix in DiskStore... 2013-05-07 18:40:35 -07:00
Reynold Xin 547dcbe494 Cleaned up Scala files in network/netty from Shane's PR. 2013-05-07 18:39:33 -07:00
Reynold Xin 9e64396ca4 Cleaned up the Java files from Shane's PR. 2013-05-07 18:30:54 -07:00
Reynold Xin 0e5cc30868 Cleaned up BlockManager and BlockFetcherIterator from Shane's PR. 2013-05-07 18:18:24 -07:00
Reynold Xin 8b79485171 Moved BlockFetcherIterator to its own file. 2013-05-07 17:02:32 -07:00
Reynold Xin 90577ada69 Merge branch 'shuffle-performance-fix-0.7' of github.com:shane-huang/spark into shufflemerge
Conflicts:
	core/src/main/scala/spark/storage/BlockManager.scala
	core/src/main/scala/spark/storage/DiskStore.scala
	project/SparkBuild.scala
2013-05-07 15:56:19 -07:00
Jey Kottalam aacca1b8a8 Update Maven build to Scala 2.9.3 2013-05-07 14:39:44 -07:00
Reynold Xin 64d4d2b036 Added tests for joins, cogroups, and unions for EmptyRDD. 2013-05-06 16:30:46 -07:00
Reynold Xin 0fd84965f6 Added EmptyRDD. 2013-05-06 15:40:34 -07:00
Imran Rashid 22a5063ae4 switch from separating appUI host & port to combining into just appUiUrl 2013-05-05 12:19:11 -07:00
Matei Zaharia 7af92f248b Merge pull request #597 from JoshRosen/webui-fixes
Two minor bug fixes for Spark Web UI
2013-05-04 22:29:17 -07:00
Reynold Xin 0a2bed356b Fixed flaky unpersist test in DistributedSuite. 2013-05-04 21:50:08 -07:00
Reynold Xin 62a077cd08 Merge branch 'unpersist-test' of github.com:shivaram/spark into blockmanager 2013-05-04 21:49:50 -07:00
Josh Rosen 42b1953c53 Fix SPARK-630: app details page shows finished executors as running. 2013-05-04 18:34:47 -07:00
Josh Rosen c0688451a6 Fix wrong closing tags in web UI HTML. 2013-05-04 18:34:46 -07:00
Josh Rosen d48e9fde01 Fix SPARK-629: weird number of cores in job details page. 2013-05-04 18:34:45 -07:00
Mridul Muralidharan 25198d7e9e Merge branch 'master' of github.com:mridulm/spark 2013-05-04 20:45:56 +05:30
Mridul Muralidharan 5b011d18d7 Merge from master 2013-05-04 20:41:27 +05:30
Mridul Muralidharan edb57c8331 Add support for instance local in getPreferredLocations of ZippedPartitionsBaseRDD. Add comments to both ZippedPartitionsBaseRDD and ZippedRDD to better describe the potential problem with the approach 2013-05-04 19:47:45 +05:30
Matei Zaharia 3bf2c868c3 Merge pull request #594 from shivaram/master
Add zip partitions to Java API
2013-05-03 18:27:30 -07:00
Shivaram Venkataraman 2274ad0786 Fix flaky test by changing catch and adding sleep 2013-05-03 16:35:35 -07:00
Shivaram Venkataraman bb8a434f9d Add zipPartitions to Java API. 2013-05-03 15:14:02 -07:00
Imran Rashid 6fae936088 applications (aka drivers) send their webUI address to master when registering so it can be displayed in the master web ui 2013-05-03 12:59:10 -07:00
Mridul Muralidharan ea2a6f91d3 pull from master 2013-05-04 00:35:59 +05:30
Reynold Xin 93091f6936 Merge branch 'master' of github.com:mesos/spark into blockmanager 2013-05-03 01:02:32 -07:00
Reynold Xin 2bc895a829 Updated according to Matei's code review comment. 2013-05-03 01:02:16 -07:00
Mridul Muralidharan 11589c39d9 Fix ZippedRDD as part Matei's suggestion 2013-05-03 12:23:30 +05:30
Matei Zaharia 6fe9d4e61e Merge pull request #592 from woggling/localdir-fix
Don't accept generated local directory names that can't be created
2013-05-02 21:33:56 -07:00
Matei Zaharia 538ee755b4 Merge pull request #581 from jerryshao/master
fix [SPARK-740] block manage UI throws exception when enabling Spark Streaming
2013-05-02 09:01:42 -07:00
Charles Reiss c847dd3da2 Don't accept generated temp directory names that can't be created successfully. 2013-05-01 23:19:10 -07:00
Reynold Xin 4a31877408 Added the unpersist api to JavaRDD. 2013-05-01 20:31:54 -07:00
Reynold Xin 98df9d2853 Added removeRdd function in BlockManager. 2013-05-01 20:17:09 -07:00
Mridul Muralidharan dfde9ce9dd comment out debug versions of checkHost, etc from Utils - which were used to test 2013-05-02 07:41:33 +05:30
Mridul Muralidharan 1b5aaeadc7 Integrate review comments 2 2013-05-02 07:30:06 +05:30
jerryshao c047f0e3ad filter out Spark streaming block RDD and sort RDDInfo with id 2013-05-02 09:48:32 +08:00
Mridul Muralidharan 609a817f52 Integrate review comments on pull request 2013-05-02 06:44:33 +05:30
Reynold Xin 204eb32e14 Changed the type of the persistentRdds hashmap back to
TimeStampedHashMap.
2013-05-01 16:14:58 -07:00
Reynold Xin 34637b97ec Added SparkContext.cleanup back. Not sure why it was removed before ... 2013-05-01 16:12:37 -07:00
Reynold Xin 3227ec8edd Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist.
Also updated unit tests to make sure they are properly testing for
concurrency.
2013-05-01 16:07:44 -07:00
harshars 8481562731 Merged Ram's commit on removing RDDs.
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2013-05-01 14:42:17 -07:00
Mridul Muralidharan 27764a00f4 Fix some npe introduced accidentally 2013-05-01 20:56:05 +05:30
Mridul Muralidharan d960e7e0f8 a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling.
b) Add some fixes to test code to ensure it passes (and fixes some other issues).

c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.
2013-05-01 20:24:00 +05:30
Matei Zaharia aa8fe1a209 Merge pull request #586 from mridulm/master
Pull request to address issues Reynold Xin reported
2013-04-30 22:30:18 -07:00
Reynold Xin dd7bef3147 Two minor fixes according to Ryan LeCompte's review. 2013-04-30 15:02:32 -07:00
Reynold Xin cea6174573 Merge branch 'master' of github.com:mesos/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
2013-04-30 13:28:35 -07:00
Mridul Muralidharan 60cabb35cb Add addition catch block for exception too 2013-05-01 01:17:14 +05:30
Mridul Muralidharan 3b748ced22 Be more aggressive and defensive in all uses of SelectionKey in select loop 2013-05-01 00:30:30 +05:30
Mridul Muralidharan 0f45477be1 Change indentation 2013-05-01 00:10:02 +05:30
Mridul Muralidharan 538614acfe Be more aggressive and defensive in select also 2013-05-01 00:05:32 +05:30
Mridul Muralidharan 48854e1dbf If key is not valid, close connection 2013-04-30 23:59:33 +05:30
Matei Zaharia f708dda81e Merge pull request #585 from pwendell/listener-perf
[Fix SPARK-742] Task Metrics should not employ per-record timing by default
2013-04-30 07:51:40 -07:00
Mridul Muralidharan e46d547ccd Fix issues reported by Reynold 2013-04-30 16:15:56 +05:30
Reynold Xin 1055785a83 Allow specifying the shuffle write file buffer size. The default buffer
size is 8KB in FastBufferedOutputStream, which is too small and would
cause a lot of disk seeks.
2013-04-29 23:33:56 -07:00
Reynold Xin 7007201201 Added a shuffle block manager so it is easier in the future to
consolidate shuffle output files.
2013-04-29 23:07:03 -07:00
Reynold Xin d3586ef438 Merge branch 'blockmanager' of github.com:rxin/spark into blockmanager
Conflicts:
	core/src/main/scala/spark/storage/DiskStore.scala
2013-04-29 15:44:18 -07:00
Patrick Wendell 016ce1fa9c Using full package name for util 2013-04-29 12:02:27 -07:00
Patrick Wendell 540be6b154 Modified version of the fix which just removes all per-record tracking. 2013-04-29 11:32:07 -07:00
Patrick Wendell 224fbac061 Spark-742: TaskMetrics should not employ per-record timing.
This patch does three things:

1. Makes TimedIterator a trait with two implementations (one a no-op)
2. Makes the default behavior to use the no-op implementation
3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like
   the triat doesn't really reduce complexity in any way.

In the future we can add other implementations, e.g. ones which perform sampling.
2013-04-29 11:13:43 -07:00
Matei Zaharia 0f45347c7b More unit test fixes 2013-04-28 22:29:27 -07:00
Matei Zaharia bce4089f22 Fix BlockManagerSuite to deal with clearing spark.hostPort 2013-04-28 22:23:48 -07:00
Matei Zaharia 68c07ea198 Merge pull request #582 from shivaram/master
Add zip partitions interface
2013-04-28 20:19:33 -07:00
Shivaram Venkataraman 604d3bf56c Rename partition class and add scala doc 2013-04-28 16:31:07 -07:00
Shivaram Venkataraman 15acd49f07 Actually rename classes to ZippedPartitions*
(the previous commit only renamed the file)
2013-04-28 16:03:22 -07:00
Shivaram Venkataraman 6e84635ab9 Rename classes from MapZipped* to Zipped* 2013-04-28 15:58:40 -07:00
Mridul Muralidharan afee902443 Attempt to fix streaming test failures after yarn branch merge 2013-04-28 22:26:45 +05:30
Shivaram Venkataraman 0cc6642b7c Rename to zipPartitions and style changes 2013-04-28 05:11:03 -07:00
Shivaram Venkataraman c9c4954d99 Add an interface to zip iterators of multiple RDDs
The current code supports 2, 3 or 4 arguments but can be extended
to more arguments if required.
2013-04-26 16:57:46 -07:00
Matei Zaharia 6e6b5204ea Create an empty directory when checkpointing a 0-partition RDD (fixes a
test failure on Hadoop 2.0)
2013-04-25 00:42:37 -07:00
Reynold Xin ba6ffa6a5f Allow the specification of a shuffle serializer in the read path (for
local block reads).
2013-04-24 17:38:07 -07:00
Reynold Xin aa618ed2a2 Allow changing the serializer on a per shuffle basis. 2013-04-24 14:52:49 -07:00
Mridul Muralidharan dd515ca3ee Attempt at fixing merge conflict 2013-04-24 09:24:17 +05:30
Reynold Xin 31ce6c66d6 Added a BlockObjectWriter interface in block manager so ShuffleMapTask
doesn't need to build up an array buffer for each shuffle bucket.
2013-04-23 17:48:59 -07:00
Mridul Muralidharan 8faf5c51c3 Patch from Thomas Graves to improve the YARN Client, and move to more production ready hadoop yarn branch 2013-04-24 02:31:57 +05:30
koeninger dfac0aa5c2 prevent mysql driver from pulling entire resultset into memory. explicitly close resultset and statement. 2013-04-22 21:12:52 -05:00
Mridul Muralidharan 7acab3ab45 Fix review comments, add a new api to SparkHadoopUtil to create appropriate Configuration. Modify an example to show how to use SplitInfo 2013-04-22 08:01:13 +05:30
koeninger b2a3f24dde first attempt at an RDD to pull data from JDBC sources 2013-04-21 00:29:37 -05:00
Mridul Muralidharan ac2e8e8720 Add some basic documentation 2013-04-19 00:13:19 +05:30
Andrew xia 8436bd5d4a remove TaskSetQueueManager and update code style 2013-04-19 02:17:22 +08:00
Andrew xia e0603d7e8b refactor the Schedulable interface and add unit test for SchedulingAlgorithm 2013-04-18 13:13:54 +08:00
Mridul Muralidharan 5ee2f5c483 Cache pattern, add (commented out) alternatives for check* apis 2013-04-17 23:13:34 +05:30
Mridul Muralidharan f07961060d Add a small note on spark.tasks.schedule.aggression 2013-04-17 23:13:02 +05:30
Mridul Muralidharan 02dffd2eb0 Ensure all ask/await block for spark.akka.askTimeout - so that it is controllable : instead of arbitrary timeouts spread across codebase. In our tests, we use 30 seconds, though default of 10 is maintained 2013-04-17 05:52:57 +05:30
Mridul Muralidharan a402b23bcd Fudge order of classpath - so that our jars take precedence over what is in CLASSPATH variable. Sounds logical, hope there is no issue cos of it 2013-04-17 05:52:00 +05:30
Mridul Muralidharan bcdde331c3 Move from master to driver 2013-04-17 04:12:18 +05:30
Mridul Muralidharan ad80f68eb5 remove spurious debug statements 2013-04-16 22:15:34 +05:30
Mridul Muralidharan f7969f72ee Fix exception when checkpoint path does not exist (no data in rdd which is being checkpointed for example) 2013-04-16 21:51:38 +05:30
Mridul Muralidharan 323ab8ff3b Scala does not prevent variable shadowing ! Sick error due to it ... 2013-04-16 17:05:10 +05:30
shane-huang b493f55a4f fix a bug in netty Block Fetcher
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-16 10:01:01 +08:00
Mridul Muralidharan 59c380d69a Fix npe 2013-04-16 03:29:38 +05:30
Mridul Muralidharan dd2b64ec97 Fix bug with atomic update 2013-04-16 03:19:24 +05:30
Mridul Muralidharan 5540ab8243 Use hostname instead of hostport for executor, fix creation of workdir 2013-04-16 02:57:43 +05:30
Mridul Muralidharan eb7e95e833 Commit job to persist files 2013-04-16 02:56:36 +05:30
Matei Zaharia a64c107449 Make ShuffledRDD.prev transient 2013-04-15 16:41:51 -04:00
Mridul Muralidharan 19652a44be Fix issue with FileSuite failing 2013-04-15 19:16:36 +05:30
Mridul Muralidharan 54b3d45b81 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:26:50 +05:30
Mridul Muralidharan d90d2af103 Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues 2013-04-15 18:12:11 +05:30
Matei Zaharia c35d530bcf Fix compile error 2013-04-13 12:43:12 -04:00
Andrew Ash 29d3440efb Add details when BlockManager heartbeats time out
Makes it more clear what the threshold was for tuning spark.storage.blockManagerSlaveTimeoutMs

Before:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats

After:
WARN  "Removing BlockManager BlockManagerId(201304022120-1976232532-5050-27464-0, myhostname, 51337) with no recent heart beats: 19216ms exceeds 15000ms
2013-04-11 01:54:02 -03:00
Andrew xia 2f883c515f Contiue to update codes for scala code style
1.refactor braces for "class" "if" "while" "for" "match"
2.make code lines less than 100
3.refactor class parameter and extends defination
2013-04-09 13:02:50 +08:00
Matei Zaharia 65caa8f711 Merge remote-tracking branch 'jey/bump-development-version-to-0.8.0'
Conflicts:
	docs/_config.yml
	project/SparkBuild.scala
2013-04-08 12:43:17 -04:00
Matei Zaharia 054feb6448 Fixed a bug with zip 2013-04-07 21:15:21 -04:00
Matei Zaharia b5900d47b1 Fix compile warning 2013-04-07 20:55:42 -04:00
Matei Zaharia 6962d40b44 Fix deprecated warning 2013-04-07 20:27:33 -04:00
Mridul Muralidharan 6798a09df8 Add support for building against hadoop2-yarn : adding new maven profile for it 2013-04-07 17:47:38 +05:30
shane-huang df47b40b76 Shuffle Performance fix: Use netty embeded OIO file server instead of ConnectionManager
Shuffle Performance Optimization: do not send 0-byte block requests to reduce network messages
change reference from io.Source to scala.io.Source to avoid looking into io.netty package

Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-04-07 14:37:12 +08:00
Andrew xia 2b373dd07a add properties default value null to fix sbt/sbt test errors 2013-04-02 12:11:14 +08:00
Mark Hamstra e215f67923 Correct sense of 'filter out' in comment. 2013-03-31 08:00:13 -07:00
Mark Hamstra 8bcdc64005 Fixed broken filter in getWritableClass[T] 2013-03-30 22:09:52 -07:00
Matei Zaharia 9831bc1a09 Merge pull request #539 from cgrothaus/fix-webui-workdirpath
Bugfix: WorkerWebUI must respect workDirPath from Worker
2013-03-29 22:16:22 -07:00
Matei Zaharia 3cc8ab6e29 Merge pull request #541 from stephenh/shufflecoalesce
Add a shuffle parameter to coalesce.
2013-03-29 22:14:07 -07:00
Andrew xia 1a28f92711 change some typo and some spacing 2013-03-29 08:34:28 +08:00
Andrew xia def3d1c84a 1.remove redundant spacing in source code
2.replace get/set functions with val and var defination
2013-03-29 08:20:35 +08:00
Jey Kottalam bc8ba222ff Bump development version to 0.8.0 2013-03-28 15:42:01 -07:00
Holden Karau f5df729b12 Explicitly catch all throwables (warning in 2.10) 2013-03-24 16:15:32 -07:00
Stephen Haberman dd854d5b9f Use Boolean in the Java API, and != for assert. 2013-03-23 11:49:45 -05:00
Stephen Haberman 4ca273edc4 Merge branch 'master' into shufflecoalesce
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-03-23 11:45:45 -05:00
Matei Zaharia b8949cab88 Merge pull request #505 from stephenh/volatile
Make Executor fields volatile since they're read from the thread pool.
2013-03-23 07:19:34 -07:00
Matei Zaharia fd53f2fc7b Merge pull request #510 from markhamstra/WithThing
mapWith, flatMapWith and filterWith
2013-03-23 07:13:21 -07:00
Andrew xia d1d9bdaabe Just update typo and comments 2013-03-23 07:25:30 +08:00
Stephen Haberman 00170eb0b9 Fix are/our typo. 2013-03-22 12:59:08 -05:00
Stephen Haberman 1c67c7dfd1 Add a shuffle parameter to coalesce.
This is useful for when you want just 1 output file (part-00000) but
still up the upstream RDD to be computed in parallel.
2013-03-22 08:54:44 -05:00
Christoph Grothaus 445f387ef4 Bugfix: WorkerWebUI must respect workDirPath from Worker 2013-03-22 11:08:40 +01:00
Matei Zaharia 35588490cb Merge pull request #538 from rxin/cogroup
Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.
2013-03-20 19:27:47 -07:00
Stephen Haberman 4f4215311a Merge branch 'master' into volatile 2013-03-20 15:37:10 -05:00
Matei Zaharia b812e6b7bb Merge pull request #526 from markhamstra/foldByKey
Add foldByKey
2013-03-20 11:21:02 -07:00
Reynold Xin d48ee7e55e Merge branch 'master' of github.com:mesos/spark into cogroup 2013-03-20 14:00:28 +08:00
Reynold Xin 00a11304fd Added mapSideCombine flag to CoGroupedRDD. Added unit test for
CoGroupedRDD.
2013-03-20 13:49:51 +08:00
Matei Zaharia 945d1e720e Merge pull request #536 from sasurfer/master
CoalescedRDD for many partitions
2013-03-19 21:59:06 -07:00
Matei Zaharia 1cbbe94ac1 Merge pull request #534 from stephenh/removetrycatch
Remove try/catch block that can't be hit.
2013-03-19 21:34:34 -07:00
Andrey Kouznetsov bd167f83b0 call setConf from input format if it is Configurable 2013-03-19 17:15:15 +04:00
Giovanni Delussu aceae029f7 CoalescedRDD changed to work with a big number of partitions both in the original and the new coalesced RDD.
The limitation was in the range that Scala.Int can represent.
2013-03-19 11:25:45 +01:00
Stephen Haberman fb34967815 Remove try/catch block that can't be hit. 2013-03-18 01:55:50 -05:00
Mark Hamstra ab33e27cc9 constructorOfA -> constructA in doc comments 2013-03-16 15:29:15 -07:00
Mark Hamstra 9784fc1fcd fix wayward comma in doc comment 2013-03-16 15:25:02 -07:00
Mark Hamstra 32979b5e7d whitespace 2013-03-16 13:36:46 -07:00
Mark Hamstra ca9f81e8fc refactor foldByKey to use combineByKey 2013-03-16 13:31:01 -07:00
Mark Hamstra 1fb192ef40 Merge branch 'master' of https://github.com/mesos/spark into foldByKey 2013-03-16 12:17:13 -07:00
Mark Hamstra 80fc8c82ed _With[Matei] 2013-03-16 12:16:29 -07:00
Mark Hamstra 38454c4aed Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-16 11:54:44 -07:00
Matei Zaharia c1e9cdc49f Merge pull request #525 from stephenh/subtractByKey
Add PairRDDFunctions.subtractByKey.
2013-03-16 11:47:45 -07:00
Mark Hamstra ef75be3bf7 Merge branch 'master' of https://github.com/mesos/spark into foldByKey 2013-03-15 21:41:24 -07:00
Andrew xia 5892393140 refactor fair scheduler implementation
1.Chage "pool" properties to be the memeber of ActiveJob
2.Abstract the Schedulable of Pool and TaskSetManager
3.Abstract the FIFO and FS comparator algorithm
4.Miscellaneous changing of class define and construction
2013-03-16 11:13:38 +08:00
Matei Zaharia cdbfd1e196 Merge pull request #516 from squito/fix_local_metrics
Fix local metrics
2013-03-15 15:13:28 -07:00
Mikhail Bautin 7fd2708eda Add a log4j compile dependency to fix build in IntelliJ
Also rename parent project to spark-parent (otherwise it shows up as
"parent" in IntelliJ, which is very confusing).
2013-03-15 11:41:51 -07:00
Mark Hamstra 1a4070477d whitespace cleanup 2013-03-15 11:28:28 -07:00
Mark Hamstra 857010392b Fuller implementation of foldByKey 2013-03-15 10:56:05 -07:00
Mark Hamstra 16a4ca4537 restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test 2013-03-14 13:58:37 -07:00
Mark Hamstra b1422cbdd5 added foldByKey 2013-03-14 12:59:58 -07:00
Stephen Haberman 7786881f47 Fix tabs that snuck in. 2013-03-14 14:57:12 -05:00
Stephen Haberman 7d8bb4df3a Allow subtractByKey's other argument to have a different value type. 2013-03-14 14:44:15 -05:00
Stephen Haberman 4632c45af1 Finished subtractByKeys. 2013-03-14 10:35:34 -05:00
Matei Zaharia 4032beba49 Merge pull request #521 from stephenh/earlyclose
Close the reader in HadoopRDD as soon as iteration end.
2013-03-13 19:29:46 -07:00
Stephen Haberman 63fe225587 Simplify SubtractedRDD in preparation from subtractByKey. 2013-03-13 17:17:34 -05:00
Mark Hamstra cd5b947cf6 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-13 13:16:14 -07:00
Stephen Haberman e7f1a69c6b Add a test for NextIterator. 2013-03-13 10:46:33 -05:00
Stephen Haberman 1a175d13b9 Add NextIterator.closeIfNeeded. 2013-03-13 10:17:39 -05:00
Stephen Haberman 8f00d23598 Remove NextIterator.close default implementation. 2013-03-12 12:30:10 -05:00
Harold Lim 0b64e5f1ac Removed some commented code 2013-03-12 13:31:27 +08:00
Harold Lim f5b1fecb9f Cleaned up the code 2013-03-12 13:31:27 +08:00
Harold Lim b5325182a3 Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies 2013-03-12 13:31:27 +08:00
Harold Lim 54ed7c4af4 Changed the name of the system property to set the allocation xml 2013-03-12 13:31:27 +08:00
Harold Lim c07087364b Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler 2013-03-12 13:31:27 +08:00
Stephen Haberman 9e68f48625 More quickly call close in HadoopRDD.
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Charles Reiss 769d399674 Send block sizes as longs. 2013-03-11 14:17:05 -07:00
Mark Hamstra 562893bea3 deleted excess curly braces 2013-03-10 22:43:08 -07:00
Imran Rashid 8a11ac3dc7 increase sleep time 2013-03-10 22:31:44 -07:00
Imran Rashid 9f97f2f9d8 add a small wait to one task to make sure some task runtime really is non-zero 2013-03-10 22:30:18 -07:00
Mark Hamstra 1289e7176b refactored _With API and added foreachPartition 2013-03-10 22:27:13 -07:00
Mark Hamstra b57df1f5e3 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-10 16:56:31 -07:00
Matei Zaharia 2e1bbc4e7e Merge remote-tracking branch 'woggling/dag-sched-driver-port'
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 16:52:54 -07:00
Matei Zaharia 91a9d093bd Merge pull request #512 from patelh/fix-kryo-serializer
Fix reference bug in Kryo serializer, add test, update version
2013-03-10 15:48:23 -07:00
Matei Zaharia 557cfd0f4d Merge pull request #515 from woggling/deploy-app-death
Notify standalone deploy client of application death.
2013-03-10 15:44:57 -07:00
Matei Zaharia a59cc6060f Merge remote-tracking branch 'stephenh/nomocks'
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 13:39:10 -07:00
Imran Rashid 20f01a0a1b enable task metrics in local mode, add tests 2013-03-09 21:17:31 -08:00
Imran Rashid ec30188a2a rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches 2013-03-09 21:16:53 -08:00
Charles Reiss b0983c5762 Notify standalone deploy client of application death.
Usually, this isn't necessary since the application will be removed
as a result of the deploy client disconnecting, but occassionally, the
standalone deploy master removes an application otherwise.

Also mark applications as FAILED instead of FINISHED when they are
killed as a result of their executors failing too many times.
2013-03-09 11:29:45 -08:00
Charles Reiss d0216cb38b Prevent DAGSchedulerSuite from corrupting driver.port.
Use the LocalSparkContext abstraction to properly manage clearing
spark.driver.port.
2013-03-09 10:49:02 -08:00
Hiral Patel 664e5fd24b Fix reference bug in Kryo serializer, add test, update version 2013-03-07 22:16:11 -08:00
Mark Hamstra 5ff0810b11 refactor mapWith, flatMapWith and filterWith to each use two parameter lists 2013-03-05 12:25:44 -08:00
Mark Hamstra d046d8ad32 whitespace formatting 2013-03-05 00:48:13 -08:00
Mark Hamstra 9148b968cf mapWith, flatMapWith and filterWith 2013-03-04 15:48:47 -08:00
Matei Zaharia 9f0dc829cb Fix TaskMetrics not being serializable 2013-03-04 12:08:31 -08:00
Matei Zaharia 04fb81ffe5 Merge pull request #506 from rxin/spark-706
Fixed SPARK-706: Failures in block manager put leads to read task hanging.
2013-03-03 17:20:07 -08:00
Imran Rashid 0bd1d00c2a minor cleanup based on feedback in review request 2013-03-03 16:46:45 -08:00
Imran Rashid f1006b99ff change CleanupIterator to CompletionIterator 2013-03-03 16:39:05 -08:00
Imran Rashid 8fef5b9c5f refactoring of TaskMetrics 2013-03-03 16:34:04 -08:00
Imran Rashid d36abdb053 Merge branch 'master' into stageInfo 2013-03-03 15:20:46 -08:00
Matei Zaharia 6bfc7cad6b Merge pull request #504 from mosharaf/master
Worker address was getting removed when removing an app.
2013-03-02 22:14:49 -08:00
Mark Hamstra 8b06b359da bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building. 2013-02-28 23:34:34 -08:00
Reynold Xin 44134e12bb Fixed SPARK-706: Failures in block manager put leads to read task
hanging.
2013-02-28 15:14:59 -08:00
Stephen Haberman 6415c2bb60 Don't create the Executor until we have everything it needs. 2013-02-28 12:38:09 -06:00
Stephen Haberman 80eecd2cb1 Make Executor fields volatile since they're read from the thread pool. 2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury 4ab387bcdb Fixed master datastructure updates after removing an application; and a typo. 2013-02-27 13:52:44 -08:00
Matei Zaharia ece3edfffa Fix a problem with no hosts being counted as alive in the first job 2013-02-26 12:11:03 -08:00
Matei Zaharia 73697e2891 Fix overly large thread names in PySpark 2013-02-26 12:07:59 -08:00
Stephen Haberman db957e5bd7 Fix MapOutputTrackerSuite. 2013-02-26 01:38:50 -06:00
Stephen Haberman a65aa549ff Override DAGScheduler.runLocally so we can remove the Thread.sleep. 2013-02-25 23:49:32 -06:00
Stephen Haberman a4adeb255c Merge branch 'master' into nomocks
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-02-25 23:48:52 -06:00
Tathagata Das c02e064938 Fixed replication bug in BlockManager 2013-02-25 17:27:46 -08:00
Matei Zaharia 490f056cdd Allow passing sparkHome and JARs to StreamingContext constructor
Also warns if spark.cleaner.ttl is not set in the version where you pass
your own SparkContext.
2013-02-25 15:13:30 -08:00
Matei Zaharia 568bdaf8ae Set spark.deploy.spreadOut to true by default in 0.7 (improves locality) 2013-02-25 14:34:55 -08:00
Matei Zaharia 1ef58dadcc Add a config property for Akka lifecycle event logging 2013-02-25 14:01:24 -08:00
Matei Zaharia ceaec4a675 Merge pull request #498 from pwendell/shutup-akka
Disable remote lifecycle logging from Akka.
2013-02-25 12:31:24 -08:00
Patrick Wendell 85a85646d9 Disable remote lifecycle logging from Akka.
This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.
2013-02-25 12:25:43 -08:00
Imran Rashid 8f17387d97 remove bogus comment 2013-02-25 10:31:06 -08:00
Matei Zaharia 6ae9a22c3e Get spark.default.paralellism on each call to defaultPartitioner,
instead of only once, in case the user changes it across Spark uses
2013-02-25 10:28:08 -08:00
Matei Zaharia d6e6abece3 Merge pull request #459 from stephenh/bettersplits
Change defaultPartitioner to use upstream split size.
2013-02-25 09:22:04 -08:00
Stephen Haberman c44ccf2862 Use default parallelism if its set. 2013-02-24 23:54:03 -06:00
Stephen Haberman 44032bc476 Merge branch 'master' into bettersplits
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/test/scala/spark/ShuffleSuite.scala
2013-02-24 22:08:14 -06:00
Christoph Grothaus f39f2b7636 Incorporate feedback from mateiz:
- we do not need getEnvOrEmpty
- Instead of saving SPARK_NONDAEMON_JAVA_OPTS, it would be better to modify the scripts to use a different variable name for the JAVA_OPTS they do eventually use
2013-02-24 21:24:30 +01:00
Tathagata Das dff53d1b94 Merge branch 'mesos-master' into streaming 2013-02-24 12:17:22 -08:00
Matei Zaharia 3b9f929467 Merge pull request #468 from haitaoyao/master
support customized java options for Master, Worker, Executor, and Repl
2013-02-23 23:38:15 -08:00
Stephen Haberman 37c7a71f9c Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD. 2013-02-24 00:27:53 -06:00
Stephen Haberman f442e7d83c Update for split->partition rename. 2013-02-24 00:27:14 -06:00
Stephen Haberman cec87a0653 Merge branch 'master' into subtract 2013-02-23 23:27:55 -06:00
Tathagata Das d853aa9658 Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs. 2013-02-23 17:42:26 -08:00
Patrick Wendell 931f439be9 Responding to code review 2013-02-23 15:40:41 -08:00
Patrick Wendell f51b0f93f2 Adding Java-accessible methods to Vector.scala
This is needed for the Strata machine learning tutorial (and
also is generally helpful).
2013-02-23 13:26:59 -08:00
Matei Zaharia d942d39072 Handle exceptions in RecordReader.close() better (suggested by Jim
Donahue)
2013-02-23 11:19:07 -08:00
Matei Zaharia c89824046a Merge pull request #490 from woggling/conn-death
Detect when SendingConnections disconnect even if we aren't sending to them
2013-02-22 22:58:19 -08:00
Charles Reiss 50cf8c8b79 Add fault tolerance test that uses replicated RDDs. 2013-02-22 16:11:53 -08:00
Charles Reiss c8a7886921 Detect when SendingConnections drop by trying to read them.
Comment fix
2013-02-22 16:11:52 -08:00
Matei Zaharia d4d7993bf5 Several fixes to the work to log when no resources can be used by a job.
Fixed some of the messages as well as code style.
2013-02-22 15:51:37 -08:00
Matei Zaharia f33662c133 Merge remote-tracking branch 'pwendell/starvation-check'
Also fixed a bug where master was offering executors on dead workers

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-02-22 15:27:41 -08:00
Matei Zaharia 7341de0d48 Merge pull request #475 from JoshRosen/spark-668
Remove hack workaround for SPARK-668
2013-02-22 14:56:18 -08:00
Patrick Wendell f8c3a03d55 SPARK-702: Replace Function --> JFunction in JavaAPI Suite.
In a few places the Scala (rather than Java) function class is used.
2013-02-22 12:54:15 -08:00
Imran Rashid 0f37b43b40 make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD 2013-02-21 16:56:28 -08:00
Imran Rashid 9230617f23 add cleanup iterator 2013-02-21 16:55:14 -08:00
Imran Rashid 81bd07da26 sparkListeners should be a val 2013-02-21 15:21:45 -08:00
Imran Rashid 796e934d31 add some docs & some cleanup 2013-02-21 15:19:34 -08:00
Imran Rashid 394d3acc3e store taskInfo & metrics together in a tuple 2013-02-21 15:19:34 -08:00
Imran Rashid 7960927cf4 get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo 2013-02-21 15:19:34 -08:00
Imran Rashid d0bfac3eed taskInfo tracks if a task is run on a preferred host 2013-02-21 15:19:34 -08:00
Imran Rashid 6f62a57858 add runtime breakdowns 2013-02-21 15:19:34 -08:00
Imran Rashid 176cb20703 add task result size; better formatting for time interval distributions; cleanup distribution formatting 2013-02-21 15:19:33 -08:00
Imran Rashid f2fcabf2ea add timing around parts of executor & track result size 2013-02-21 15:19:33 -08:00
Imran Rashid ff127cfcd3 Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/storage/BlockManager.scala
2013-02-21 15:16:21 -08:00
Imran Rashid 69f9a7035f fully revert change to addOnCompleteCallback -- missed this in e9f53ec 2013-02-21 15:07:46 -08:00
Imran Rashid baab23abdf TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task 2013-02-21 14:13:01 -08:00
haitao.yao 8215b95547 Merge branch 'mesos' 2013-02-21 10:07:24 +08:00
Christoph Grothaus 85a35c6840 Fix SPARK-698. From ExecutorRunner, launch java directly instead via the run scripts. 2013-02-20 21:42:11 +01:00
Tathagata Das 334ab92441 Fixed bug in CheckpointSuite 2013-02-20 10:26:36 -08:00
Tathagata Das 1cb725e417 Merge branch 'mesos-master' into streaming 2013-02-20 09:55:35 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Matei Zaharia 05bc02e80b Merge pull request #482 from woggling/shutdown-exceptions
Don't call System.exit over uncaught exceptions from shutdown hooks
2013-02-19 20:56:15 -08:00
haitao.yao 6a3d44c673 Merge branch 'mesos' 2013-02-20 10:23:58 +08:00
Charles Reiss 092c631fa8 Pull detection of being in a shutdown hook into utility function. 2013-02-19 17:49:55 -08:00
Reynold Xin 130f704baf Added a method to create PartitionPruningRDD. 2013-02-19 16:03:52 -08:00
Charles Reiss d0588bd6d7 Catch/log errors deleting temp dirs 2013-02-19 13:04:06 -08:00
Charles Reiss 687581c3ec Paranoid uncaught exception handling for exceptions during shutdown 2013-02-19 13:03:02 -08:00
haitao.yao 7c129388fb Merge branch 'mesos' 2013-02-19 11:22:24 +08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00
Matei Zaharia 06e5e6627f Renamed "splits" to "partitions" 2013-02-17 22:13:26 -08:00
Matei Zaharia 340cc54e47 Merge pull request #471 from stephenh/parallelrdd
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia 3260b6120e Merge pull request #470 from stephenh/morek
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman 924f47dd11 Add RDD.subtract.
Instead of reusing the cogroup primitive, this adds a SubtractedRDD
that knows it only needs to keep rdd1's values (per split) in memory.
2013-02-16 13:38:42 -06:00
Stephen Haberman e7713adb99 Move ParallelCollection into spark.rdd package. 2013-02-16 13:20:48 -06:00
Stephen Haberman ae2234687d Make CoGroupedRDDs explicitly have the same key type. 2013-02-16 13:10:31 -06:00
Stephen Haberman 4328873294 Add assertion about dependencies. 2013-02-16 01:16:40 -06:00
Stephen Haberman c34b8ad2c5 Avoid a shuffle if combineByKey is passed the same partitioner. 2013-02-16 00:54:03 -06:00
Stephen Haberman 4281e579c2 Update more javadocs. 2013-02-16 00:45:03 -06:00
Stephen Haberman 6a2d957843 Tweak test names. 2013-02-16 00:33:49 -06:00
Stephen Haberman 37397106ce Remove fileServerSuite.txt. 2013-02-16 00:31:07 -06:00
Stephen Haberman 6cd68c31cb Update default.parallelism docs, have StandaloneSchedulerBackend use it.
Only brand new RDDs (e.g. parallelize and makeRDD) now use default
parallelism, everything else uses their largest parent's partitioner
or partition size.
2013-02-16 00:29:11 -06:00
haitao.yao a9cfac347a Merge branch 'mesos' 2013-02-16 10:11:28 +08:00
Imran Rashid bffee929ab Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/storage/BlockManager.scala
2013-02-15 10:35:04 -08:00
Imran Rashid 893bad9089 use appid instead of frameworkid; simplify stupid condition 2013-02-13 20:30:21 -08:00
Imran Rashid 8f18e7e863 include jobid in Executor commandline args 2013-02-13 13:05:13 -08:00
Matei Zaharia fd7e414bd0 Merge pull request #464 from pwendell/java-type-fix
SPARK-694: All references to [K, V] in JavaDStreamLike should be changed to [K2, V2]
2013-02-11 19:19:05 -08:00
Matei Zaharia bfeed4725d Merge pull request #465 from pwendell/java-sort-fix
SPARK-696: sortByKey should use 'ascending' parameter
2013-02-11 18:23:12 -08:00
Patrick Wendell 21df6ffc13 SPARK-696: sortByKey should use 'ascending' parameter 2013-02-11 17:43:26 -08:00
Matei Zaharia ea08537143 Fixed an exponential recursion that could happen with doCheckpoint due
to lack of memoization
2013-02-11 13:23:50 -08:00
Josh Rosen e9fb25426e Remove hack workaround for SPARK-668.
Renaming the type paramters solves this problem (see SPARK-694).

I tried this fix earlier, but it didn't work because I didn't run
`sbt/sbt clean` first.
2013-02-11 11:19:20 -08:00
Patrick Wendell f0b68c623c Initial cut at replacing K, V in Java files 2013-02-11 10:03:37 -08:00
Imran Rashid e9f53ec0ea undo chnage to onCompleteCallbacks 2013-02-11 09:36:49 -08:00
Matei Zaharia da8afbc77e Some bug and formatting fixes to FT
Conflicts:
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:43:38 -08:00
root 1b47fa2752 Detect hard crashes of workers using a heartbeat mechanism.
Also fixes some issues in the rest of the code with detecting workers this way.

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
	core/src/main/scala/spark/deploy/worker/Worker.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
2013-02-10 22:28:28 -08:00
Matei Zaharia 8c66c49962 Tweak web UI so that people don't get confused about master URL format
Conflicts:
	core/src/main/twirl/spark/deploy/master/index.scala.html
	core/src/main/twirl/spark/deploy/worker/index.scala.html
2013-02-10 21:58:34 -08:00
Imran Rashid d9461b15d3 cleanup a bunch of imports 2013-02-10 21:41:40 -08:00
Tathagata Das 16baea62bc Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits. 2013-02-10 19:14:49 -08:00
Imran Rashid 383af599bb SparkContext.addSparkListener; "std" listener in StatsReportListener 2013-02-10 14:19:37 -08:00
Imran Rashid b7d9e24394 use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver 2013-02-10 14:18:52 -08:00
Stephen Haberman 680f42e6cd Change defaultPartitioner to use upstream split size.
Previously it used the SparkContext.defaultParallelism, which occassionally
ended up being a very bad guess. Looking at upstream RDDs seems to make
better use of the context.

Also sorted the upstream RDDs by partition size first, as if we have
a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely
we want the resulting RDD to be tiny-partitioned.
2013-02-10 02:27:03 -06:00
Patrick Wendell 2ed791fd7f Minor fixes 2013-02-09 22:00:38 -08:00
Patrick Wendell 1859c9f93c Changing to use Timer based on code review 2013-02-09 21:55:17 -08:00
Matei Zaharia ccb1ca4a23 Merge pull request #448 from squito/fetch_maxBytesInFlight
add as many fetch requests as we can, subject to maxBytesInFlight
2013-02-09 18:15:18 -08:00
Matei Zaharia f750daa510 Merge pull request #452 from stephenh/misc
Add RDD.coalesce, clean up some RDDs, other misc.
2013-02-09 18:12:56 -08:00
Stephen Haberman 4619ee0787 Move JavaRDDLike.coalesce into the right places. 2013-02-09 20:05:42 -06:00
Stephen Haberman 921be76533 Use stubs instead of mocks for DAGSchedulerSuite. 2013-02-09 16:42:18 -06:00
Stephen Haberman fb7599870f Fix JavaRDDLike.coalesce return type. 2013-02-09 16:10:52 -06:00
Stephen Haberman 2a18cd826c Add back return types. 2013-02-09 10:12:04 -06:00
Stephen Haberman da52b16b38 Remove RDD.coalesce default arguments. 2013-02-09 10:11:54 -06:00
Imran Rashid 04e828f7c1 general fixes to Distribution, plus some tests 2013-02-08 19:07:36 -08:00
Mark Hamstra b8863a79d3 Merge branch 'master' of https://github.com/mesos/spark into commutative
Conflicts:
	core/src/main/scala/spark/RDD.scala
2013-02-08 18:26:00 -08:00
Mark Hamstra 934a53c8b6 Change docs on 'reduce' since the merging of local reduces no longer preserves
ordering, so the reduce function must also be commutative.
2013-02-05 22:19:58 -08:00
Stephen Haberman a9c8d53cfa Clean up RDDs, mainly to use getSplits.
Also made sure clearDependencies() was calling super, to ensure
the getSplits/getDependencies vars in the RDD base class get
cleaned up.
2013-02-05 22:16:59 -06:00
Stephen Haberman f4d43cb43e Remove unneeded zipWithIndex.
Also rename r->rdd and remove unneeded extra type info.
2013-02-05 21:26:45 -06:00
Stephen Haberman f2bc748013 Add RDD.coalesce. 2013-02-05 21:23:36 -06:00
Stephen Haberman 67df7f2fa2 Add private, minor formatting. 2013-02-05 21:08:21 -06:00
Imran Rashid 379564c7e0 setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place 2013-02-05 18:30:21 -08:00
Matei Zaharia 9cfa068379 Merge pull request #450 from stephenh/inlinemergepair
Inline mergePair to look more like the narrow dep branch.
2013-02-05 18:28:44 -08:00
Stephen Haberman 870b2aaf5d Merge branch 'master' into fixdeathpactexception
Conflicts:
	core/src/main/scala/spark/deploy/worker/Worker.scala
2013-02-05 20:27:09 -06:00
Matei Zaharia a4611d66f0 Merge pull request #449 from stephenh/longerdriversuite
Increase DriverSuite timeout.
2013-02-05 17:58:22 -08:00
Stephen Haberman 0e19093fd8 Handle Terminated to avoid endless DeathPactExceptions.
Credit to Roland Kuhn, Akka's tech lead, for pointing out this
various obvious fix, but StandaloneExecutorBackend.preStart's
catch block would never (ever) get hit, because all of the
operation's in preStart are async.

So, the System.exit in the catch block was skipped, and instead
Akka was sending Terminated messages which, since we didn't
handle, it turned into DeathPactException, which started
a postRestart/preStart infinite loop.
2013-02-05 18:58:00 -06:00
Stephen Haberman 1ba3393ceb Increase DriverSuite timeout. 2013-02-05 17:56:50 -06:00
Stephen Haberman 8bd0e888f3 Inline mergePair to look more like the narrow dep branch.
No functionality changes, I think this is just more consistent
given mergePair isn't called multiple times/recursive.

Also added a comment to explain the usual case of having two parent RDDs.
2013-02-05 17:50:25 -06:00
Imran Rashid 1704b124d8 add as many fetch requests as we can, subject to maxBytesInFlight 2013-02-05 14:33:52 -08:00
Imran Rashid cfab1a3528 add as many fetch requests as we can, subject to maxBytesInFlight 2013-02-05 14:31:46 -08:00
Imran Rashid 696e4b2167 track remoteFetchTime 2013-02-05 14:29:16 -08:00
Imran Rashid b29f9cc978 BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance 2013-02-05 14:00:44 -08:00
Imran Rashid e319ac74c1 cogrouped RDD stores the amount of time taken to read shuffle data in each task 2013-02-05 10:18:16 -08:00
Imran Rashid 295b534398 task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount 2013-02-05 10:18:16 -08:00
Imran Rashid 9df7e2ae55 Shuffle Fetchers use a timed iterator 2013-02-05 10:18:16 -08:00
Imran Rashid 1ad77c4766 add TimedIterator 2013-02-05 10:18:15 -08:00
Imran Rashid 843084d69d track total bytes written by ShuffleMapTasks 2013-02-05 10:18:15 -08:00
haitao.yao f609182e5b Merge branch 'mesos' 2013-02-05 14:09:45 +08:00
Imran Rashid b430d2359d Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-02-04 21:40:44 -08:00
Matei Zaharia f6ec547ea7 Small fix to test for distinct 2013-02-04 13:14:54 -08:00
Matei Zaharia aa4ee1e9e5 Fix failing test 2013-02-04 11:06:31 -08:00
Matei Zaharia f7b4e428be Merge pull request #445 from JoshRosen/pyspark_fixes
Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()
2013-02-03 21:36:36 -08:00
haitao.yao faa4d9e31f Merge branch 'mesos' 2013-02-04 11:40:15 +08:00
Patrick Wendell b14322956c Starvation check in Standlone scheduler 2013-02-03 12:45:10 -08:00
Patrick Wendell 667860448a Starvation check in ClusterScheduler 2013-02-03 12:45:04 -08:00
Matei Zaharia 3bfaf3ab1d Merge pull request #379 from stephenh/sparkmem
Add spark.executor.memory to differentiate executor memory from spark-shell
2013-02-02 23:58:23 -08:00
Matei Zaharia 88ee6163a1 Merge pull request #422 from squito/blockmanager_info
RDDInfo available from SparkContext
2013-02-02 23:44:13 -08:00
Matei Zaharia cd4ca93679 Merge pull request #436 from stephenh/removeextraloop
Once we find a split with no block, we don't have to look for more.
2013-02-02 23:39:28 -08:00
Matei Zaharia d5daaab381 Merge pull request #442 from stephenh/fixsystemnames
Fix createActorSystem not actually using the systemName parameter.
2013-02-02 23:38:46 -08:00
Matei Zaharia 9163c3705d Formatting 2013-02-02 23:34:47 -08:00
Josh Rosen 8fbd5380b7 Fetch fewer objects in PySpark's take() method. 2013-02-03 06:44:49 +00:00
Matei Zaharia 34a7bcdb3a Formatting 2013-02-02 19:40:30 -08:00
Stephen Haberman 7aba123f0c Further simplify checking for Nil. 2013-02-02 13:53:28 -06:00
Charles Reiss 6107957962 Merge remote-tracking branch 'base/master' into dag-sched-tests
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Stephen Haberman cae8a6795c Fix dangling old variable names. 2013-02-02 02:15:39 -06:00
Stephen Haberman 696eec32c9 Move executorMemory up into SchedulerBackend. 2013-02-02 02:03:26 -06:00
Stephen Haberman 103c375ba0 Merge branch 'master' into sparkmem 2013-02-02 01:57:18 -06:00
Stephen Haberman 28e0cb9f31 Fix createActorSystem not actually using the systemName parameter.
This meant all system names were "spark", which worked, but didn't
lead to the most intuitive log output.

This fixes createActorSystem to use the passed system name, and
refactors Master/Worker to encapsulate their system/actor names
instead of having the clients guess at them.

Note that the driver system name, "spark", is left as is, and is
still repeated a few times, but that seems like a separate issue.
2013-02-02 01:11:37 -06:00
Charles Reiss 1fd5ee323d Code review changes: add sc.stop; style of multiline comments; parens on procedure calls. 2013-02-01 22:33:38 -08:00
Matei Zaharia ae26911ec0 Add back test for distinct without parens 2013-02-01 21:07:24 -08:00
Stephen Haberman 12c1eb4756 Reduce the amount of duplicate logging Akka does to stdout.
Given we have Akka logging go through SLF4j to log4j, we don't need
all the extra noise of Akka's stdout logger that is supposedly only
used during Akka init time but seems to continue logging lots of
noisy network events that we either don't care about or are in the
log4j logs anyway.

See:

http://doc.akka.io/docs/akka/2.0/general/configuration.html

    # Log level for the very basic logger activated during AkkaApplication startup
    # Options: ERROR, WARNING, INFO, DEBUG
    # stdout-loglevel = "WARNING"
2013-02-01 21:21:44 -06:00
Matei Zaharia 8b3041c723 Reduced the memory usage of reduce and similar operations
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Matei Zaharia 4529876db0 Merge branch 'master' of github.com:mesos/spark 2013-02-01 14:07:38 -08:00
Matei Zaharia 9970926ede formatting 2013-02-01 14:07:34 -08:00
Matei Zaharia 79c24abe4c Merge pull request #432 from stephenh/moreprivacy
Add more private declarations.
2013-02-01 14:06:55 -08:00
Matei Zaharia de340ddf0b Merge pull request #437 from stephenh/cancelmetacleaner
Stop BlockManagers metadataCleaner.
2013-02-01 12:59:25 -08:00
Imran Rashid c6190067ae remove unneeded (and unused) filter on block info 2013-02-01 09:55:25 -08:00
Stephen Haberman 59c57e48df Stop BlockManagers metadataCleaner. 2013-02-01 10:34:02 -06:00
Matei Zaharia 571af31304 Merge pull request #433 from rxin/master
Changed PartitionPruningRDD's split to make sure it returns the correct split index.
2013-02-01 00:32:41 -08:00
Imran Rashid 8a0a5ed533 track total partitions, in addition to cached partitions; use scala string formatting 2013-02-01 00:23:38 -08:00
Imran Rashid f127f2ae76 fixup merge (master -> driver renaming) 2013-02-01 00:20:49 -08:00
Reynold Xin f9af9cee6f Moved PruneDependency into PartitionPruningRDD.scala. 2013-02-01 00:02:46 -08:00
haitao.yao b57570fd12 Merge branch 'mesos' 2013-02-01 14:06:45 +08:00
Matei Zaharia 7e2e046e37 Merge pull request #434 from pwendell/python-exceptions
SPARK-673: Capture and re-throw Python exceptions
2013-01-31 21:58:26 -08:00
Patrick Wendell 39ab83e957 Small fix from last commit 2013-01-31 21:52:52 -08:00
Patrick Wendell c33f0ef41a Some style cleanup 2013-01-31 21:50:02 -08:00
Patrick Wendell 3446d5c8d6 SPARK-673: Capture and re-throw Python exceptions
This patch alters the Python <-> executor protocol to pass on
exception data when they occur in user Python code.
2013-01-31 18:06:11 -08:00
Reynold Xin 6289d9654e Removed the TODO comment from PartitionPruningRDD. 2013-01-31 17:49:36 -08:00
Reynold Xin 5b0fc265c2 Changed PartitionPruningRDD's split to make sure it returns the correct
split index.
2013-01-31 17:48:39 -08:00
Stephen Haberman 782187c210 Once we find a split with no block, we don't have to look for more. 2013-01-31 18:27:25 -06:00
Stephen Haberman 418e36caa8 Add more private declarations. 2013-01-31 17:18:33 -06:00
Mikhail Bautin fe3eceab57 Remove activation of profiles by default
See the discussion at https://github.com/mesos/spark/pull/355 for why
default profile activation is a problem.
2013-01-31 13:30:41 -08:00
haitao.yao 3190483b98 bug fix for javadoc 2013-01-31 14:23:51 +08:00
Imran Rashid 02a6761589 Merge branch 'master' into blockmanager_info
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
2013-01-30 18:52:35 -08:00
Imran Rashid c1df24d085 rename Slaves --> Executor 2013-01-30 18:51:14 -08:00
Matei Zaharia d12330bd2c Merge pull request #426 from woggling/conn-manager-ips
Remember ConnectionManagerId used to initiate SendingConnections
2013-01-30 15:02:53 -08:00
Matei Zaharia 612a9fee71 Merge pull request #428 from woggling/mesos-exec-id
Make ExecutorIDs include SlaveIDs when running Mesos
2013-01-30 15:01:46 -08:00
Stephen Haberman 871476d506 Include message and exitStatus if availalbe. 2013-01-30 16:56:46 -06:00
Charles Reiss 252845d304 Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend 2013-01-30 10:38:06 -08:00
Charles Reiss f7de6978c1 Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use
the Mesos ExecutorID as a Spark ExecutorID.
2013-01-30 09:38:57 -08:00
Charles Reiss 7f51458774 Comment at top of DAGSchedulerSuite 2013-01-30 09:34:53 -08:00
Charles Reiss 9c0bae75ad Change DAGSchedulerSuite to run DAGScheduler in the same Thread. 2013-01-30 09:22:07 -08:00
Charles Reiss 178b89204c Refactor DAGScheduler more to allow testing without a separate thread. 2013-01-30 09:19:55 -08:00
Charles Reiss 4bf3d7ea12 Clear spark.master.port to cleanup for other tests 2013-01-29 19:05:58 -08:00
Charles Reiss 9eac7d01f0 Add DAGScheduler tests. 2013-01-29 18:55:43 -08:00
Charles Reiss a3d14c0404 Refactoring to DAGScheduler to aid testing 2013-01-29 18:55:42 -08:00
Charles Reiss 16a0789e10 Remember ConnectionManagerId used to initiate SendingConnections.
This prevents ConnectionManager from getting confused if a machine
has multiple host names and the one getHostName() finds happens
not to be the one that was passed from, e.g., the BlockManagerMaster.
2013-01-29 18:13:59 -08:00
Matei Zaharia d54b10b6ad Merge remote-tracking branch 'stephenh/removefailedjob'
Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-01-29 18:12:29 -08:00
Matei Zaharia ccb67ff2ca Merge pull request #425 from stephenh/toDebugString
Add RDD.toDebugString.
2013-01-29 10:44:18 -08:00
Matei Zaharia 9ae11603b4 Merge pull request #415 from stephenh/driver
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Charles Reiss a34096a76d Add easymock to POMs 2013-01-29 10:04:33 -08:00
Imran Rashid b92259ba57 Merge branch 'master' into blockmanager_info 2013-01-29 09:45:10 -08:00
Matei Zaharia 64ba6a8c2c Simplify checkpointing code and RDD class a little:
- RDD's getDependencies and getSplits methods are now guaranteed to be
  called only once, so subclasses can safely do computation in there
  without worrying about caching the results.

- The management of a "splits_" variable that is cleared out when we
  checkpoint an RDD is now done in the RDD class.

- A few of the RDD subclasses are simpler.

- CheckpointRDD's compute() method no longer assumes that it is given a
  CheckpointRDDSplit -- it can work just as well on a split from the
  original RDD, because it only looks at its index. This is important
  because things like UnionRDD and ZippedRDD remember the parent's
  splits as part of their own and wouldn't work on checkpointed parents.

- RDD.iterator can now reuse cached data if an RDD is computed before it
  is checkpointed. It seems like it wouldn't do this before (it always
  called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Stephen Haberman cbf72bffa5 Include name, if set, in RDD.toString(). 2013-01-29 00:20:36 -06:00
Stephen Haberman 3cda14af3f Add number of splits. 2013-01-29 00:12:31 -06:00
Matei Zaharia a1ecec8d79 Merge branch 'master' of github.com:mesos/spark 2013-01-28 22:08:44 -08:00
Stephen Haberman 951cfd9ba2 Add JavaRDDLike.toDebugString(). 2013-01-29 00:02:17 -06:00
Matei Zaharia f6eb1f0825 Merge pull request #413 from pwendell/stage-logging
SPARK-658: Adding logging of stage duration
2013-01-28 22:01:52 -08:00
Stephen Haberman b45857c965 Add RDD.toDebugString.
Original idea by Nathan Kronenfeld.
2013-01-28 23:56:56 -06:00
Patrick Wendell 7ee824e42e Units from ms -> s 2013-01-28 21:48:32 -08:00
Stephen Haberman 13368818af Merge branch 'master' into driver
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/SparkEnv.scala
	core/src/main/scala/spark/deploy/LocalSparkCluster.scala
	core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
	core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/ThreadingTest.scala
	core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Matei Zaharia dda2ce017c Merge pull request #424 from pwendell/logging-cleanup
Some DEBUG-level log cleanup.
2013-01-28 21:18:54 -08:00
Patrick Wendell 1f9b486a8b Some DEBUG-level log cleanup.
A few changes to make the DEBUG-level logs less
noisy and more readable.

- Moved a few very frequent messages to Trace
- Changed some BlockManger log messages to make them
  more understandable

SPARK-666 #resolve
2013-01-28 20:29:35 -08:00
Imran Rashid efff7bfb33 add long and float accumulatorparams 2013-01-28 20:23:11 -08:00
Imran Rashid cec9c768c2 convenient name available in StageInfo 2013-01-28 20:09:41 -08:00
Imran Rashid 01d77f329f expose stageInfo in SparkContext 2013-01-28 20:09:40 -08:00
Imran Rashid 38b83bc66b can get task runtime summary from task info 2013-01-28 20:09:40 -08:00
Imran Rashid b88daee916 simple util to summarize distributions 2013-01-28 20:09:40 -08:00
Imran Rashid b14841455c track task completion in DAGScheduler, and send a stageCompleted event with taskInfo to SparkListeners 2013-01-28 20:09:40 -08:00
Imran Rashid 0f22c4207f better formatting for RDDInfo 2013-01-28 20:07:53 -08:00
Imran Rashid a423ee546c expose RDD & storage info directly via SparkContext 2013-01-28 20:07:53 -08:00
Patrick Wendell 501433f1d5 Making submission time a field 2013-01-28 10:45:57 -08:00
Patrick Wendell c423be7d8e Renaming stage finished function 2013-01-28 10:45:57 -08:00
Patrick Wendell 07f568e1bf SPARK-658: Adding logging of stage duration 2013-01-28 10:45:57 -08:00
Matei Zaharia 286f8f876f Change time unit in MetadataCleaner to seconds 2013-01-28 01:29:27 -08:00
Matei Zaharia f03d9760fd Clean up BlockManagerUI a little (make it not be an object, merge with
Directives, and bind to a random port)
2013-01-27 23:56:14 -08:00
Matei Zaharia 909850729e Rename more things from slave to executor 2013-01-27 23:17:20 -08:00
Matei Zaharia 44b4a0f88f Track workers by executor ID instead of hostname to allow multiple
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia 6ad8540b40 Merge pull request #401 from squito/blockmanager_ui
Blockmanager ui
2013-01-27 15:51:08 -08:00
Matei Zaharia 49f6472c0f Merge pull request #418 from woggling/reregister-deadlock
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss 58fc6b2bed Handle duplicate registrations better. 2013-01-26 18:30:44 -08:00
Charles Reiss ad4232b4da Fix deadlock in BlockManager reregistration triggered by failed updates. 2013-01-26 18:30:38 -08:00
Josh Rosen d49cf0e587 Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Imran Rashid 49c05608f5 add metadatacleaner for persisentRdd map 2013-01-25 17:04:16 -08:00
Stephen Haberman 8efbda0b17 Call executeOnCompleteCallbacks in more finally blocks. 2013-01-25 14:55:33 -06:00
Imran Rashid a1d9d1767d fixup 1cadaa1, changed api of map 2013-01-25 10:05:26 -08:00
Imran Rashid 1cadaa164e switch to TimeStampedHashMap for storing persistent Rdds 2013-01-25 09:30:21 -08:00
Imran Rashid 539491bbc3 code reformatting 2013-01-25 09:29:59 -08:00
Stephen Haberman 7dfb82a992 Replace old 'master' term with 'driver'. 2013-01-25 11:03:00 -06:00
Stephen Haberman ec43a51b38 Merge branch 'master' into localsparkcontext
Conflicts:
	core/src/test/scala/spark/FileServerSuite.scala
	core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Patrick Wendell b6fc6e6752 SPARK-541: Adding a warning for invalid Master URL
Right now Spark silently parses master URL's which do not match any
known regex as a Mesos URL. The Mesos error message when an invalid URL gets
passed is really confusing, so this warns the user when the implicit
conversion is happening.
2013-01-24 14:31:23 -08:00
Stephen Haberman 230bda2047 Add LocalSparkContext to manage common sc variable. 2013-01-24 11:01:01 -06:00
Matei Zaharia 0fe173a3a5 Merge pull request #410 from rxin/splitpruningrdd
Added a clearDependencies method in PartitionPruningRDD.
2013-01-23 23:10:15 -08:00
Reynold Xin 67a43bc7e6 Added a clearDependencies method in PartitionPruningRDD. 2013-01-23 23:06:52 -08:00
Matei Zaharia fe5e4812fc Merge pull request #409 from rxin/splitpruningrdd
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin c109f29c97 Updated PruneDependency to change "split" to "partition". 2013-01-23 22:22:03 -08:00
Reynold Xin eedc542a02 Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin 81004b967e Marked prev RDD as transient in SplitsPruningRDD. 2013-01-23 21:54:27 -08:00
Reynold Xin 636e912f32 Created a PruneDependency to properly assign dependency for
SplitsPruningRDD.
2013-01-23 21:21:55 -08:00
Reynold Xin 45cd50d5fe Updated assert == to ===. 2013-01-23 16:06:58 -08:00
Matei Zaharia 548856a224 Merge remote-tracking branch 'woggling/remove-machines'
Conflicts:
	core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin c24b3819dd Added an extra assert for split size check. 2013-01-23 15:34:59 -08:00
Reynold Xin eb222b7206 Added pruntSplits method to RDD. 2013-01-23 15:29:02 -08:00
Matei Zaharia 1dd82743e0 Fix compile error due to cherry-pick 2013-01-23 13:07:27 -08:00
Charles Reiss 5c7422292e Remove more dead code from test. 2013-01-23 12:59:51 -08:00
Imran Rashid e1985bfa04 be sure to set class loader of kryo instances 2013-01-23 12:51:09 -08:00
Charles Reiss be4a115a7e Clarify TODO. 2013-01-23 12:48:45 -08:00
Charles Reiss 88b9d240fd Remove dead code in test. 2013-01-23 12:40:38 -08:00
Matei Zaharia 1a3aeeca23 Merge pull request #407 from woggling/no-cache-tracker
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Charles Reiss e1027ca639 Actually add CacheManager. 2013-01-23 12:22:11 -08:00
Matei Zaharia 4147e1d47b Merge pull request #406 from tdas/master
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia 4d77d554e1 Merge pull request #394 from JoshRosen/add_file_fix
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Josh Rosen ae2ed2947d Allow PySpark's SparkFiles to be used from driver
Fix minor documentation formatting issues.
2013-01-23 10:58:50 -08:00
Tathagata Das 79d55700ce One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future. 2013-01-23 01:57:09 -08:00
Charles Reiss 0b506dd2ec Add tests of various node failure scenarios. 2013-01-23 01:38:15 -08:00
Charles Reiss d209b6b764 Extra debugging from hostLost() 2013-01-23 01:35:14 -08:00
Charles Reiss 9a27062260 Force generation increment after shuffle map stage 2013-01-23 01:34:44 -08:00
Tathagata Das 155f31398d Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides. 2013-01-23 01:10:26 -08:00
Tathagata Das 5e11f1e51f Modified StorageLevel API to ensure zero duplicate objects. 2013-01-22 23:42:53 -08:00
Tathagata Das bacade6caf Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite. 2013-01-22 22:55:26 -08:00
Josh Rosen 43e9ff9596 Add test for driver hanging on exit (SPARK-530). 2013-01-22 22:47:26 -08:00
Charles Reiss 2849931000 Eliminate CacheTracker.
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.

Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Matei Zaharia ebaa8f6519 Merge remote-tracking branch 'stephenh/cleanup'
Conflicts:
	core/src/main/scala/spark/scheduler/local/LocalScheduler.scala
2013-01-22 21:05:45 -08:00
Matei Zaharia d2d273868b Merge pull request #397 from JoshRosen/refactoring/daemon-threads
Refactor daemon thread creation
2013-01-22 21:02:53 -08:00
Stephen Haberman 98d0b7747d Fix Worker logInfo about unknown executor. 2013-01-22 18:11:51 -06:00
Stephen Haberman 8c51322cd0 Don't bother creating an exception. 2013-01-22 18:09:10 -06:00
Stephen Haberman fdec42385a Fix SPARK_MEM in ExecutorRunner. 2013-01-22 18:01:12 -06:00
Stephen Haberman 2437f6741b Restore SPARK_MEM in executorEnvs. 2013-01-22 18:01:03 -06:00
Matei Zaharia 151c47eef5 Merge pull request #399 from NFLabs/master
Fix for hanging spark.HttpFileServer on the kind of virtual network
2013-01-22 15:49:24 -08:00
Stephen Haberman 250fe89679 Handle Master telling the Worker to kill an already-dead executor. 2013-01-22 16:29:05 -06:00
Stephen Haberman 6f2194f757 Call removeJob instead of killing the cluster. 2013-01-22 15:38:58 -06:00
Stephen Haberman 27b3f3f0a9 Handle slaveLost before slaveIdToHost knows about it. 2013-01-22 15:30:42 -06:00
Imran Rashid 905c720e5e Merge branch 'master' into blockmanager_ui
Conflicts:
	core/src/main/scala/spark/RDD.scala
2013-01-22 12:02:27 -08:00
Imran Rashid 50e2b23927 Fix up some problems from the merge 2013-01-22 11:46:01 -08:00
Stephen Haberman 588b24197a Use default arguments instead of constructor overloads. 2013-01-22 10:19:30 -06:00
Leemoonsoo 7e9ee2e833 Fix for hanging spark.HttpFileServer with kind of virtual network 2013-01-22 23:08:34 +09:00
Charles Reiss e353886a8c Use generation numbers for fetch failure tracking 2013-01-22 00:23:31 -08:00
Josh Rosen 551a47a620 Refactor daemon thread pool creation. 2013-01-21 23:31:00 -08:00
Stephen Haberman a8baeb9327 Further simplify getOrElse call. 2013-01-21 21:30:24 -06:00
Stephen Haberman 2d8218b871 Remove unneeded/now-broken saveAsNewAPIHadoopFile overload. 2013-01-21 20:00:27 -06:00
Josh Rosen 7b9e96c992 Add synchronization to Executor.updateDependencies() (SPARK-662) 2013-01-21 17:34:23 -08:00
Josh Rosen ef711902c1 Don't download files to master's working directory.
This should avoid exceptions caused by existing
files with different contents.

I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman ffd1623595 Minor cleanup. 2013-01-21 15:55:46 -06:00
Matei Zaharia a88b44ed3b Only bind to IPv4 addresses when trying to auto-detect external IP 2013-01-21 11:59:21 -08:00
Matei Zaharia 4d34c7fc3e Fix compile error caused by cherry-pick 2013-01-21 11:33:48 -08:00
Imran Rashid a3f571b539 more File -> String changes 2013-01-21 11:21:52 -08:00
Imran Rashid fe26acc482 remove unused imports 2013-01-21 11:21:46 -08:00
Imran Rashid c73107500e send sparkHome as String instead of File over network 2013-01-21 11:21:39 -08:00
Imran Rashid 5bf73df7f0 oops, fix stupid compile error 2013-01-21 11:21:33 -08:00
Imran Rashid aae5a920a4 get sparkHome the correct way 2013-01-21 11:21:28 -08:00
Imran Rashid f116d6b5c6 executor can use a different sparkHome from Worker 2013-01-21 11:21:22 -08:00
Stephen Haberman 6ded481999 Merge branch 'master' into hadoopconf
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
2013-01-21 12:56:48 -06:00
Stephen Haberman 69a417858b Also use hadoopConfiguration in newAPI methods. 2013-01-21 12:42:11 -06:00
Matei Zaharia c0b9ceb8c3 Log remote lifecycle events in Akka for easier debugging 2013-01-21 00:23:53 -08:00
Matei Zaharia c7b5e5f1ec Merge pull request #389 from JoshRosen/python_rdd_checkpointing
Add checkpointing to the Python API
2013-01-20 17:10:44 -08:00
Josh Rosen 9f211dd3f0 Fix PythonPartitioner equality; see SPARK-654.
PythonPartitioner did not take the Python-side partitioning function
into account when checking for equality, which might cause problems
in the future.
2013-01-20 15:41:42 -08:00
Josh Rosen 5b6ea9e9a0 Update checkpointing API docs in Python/Java. 2013-01-20 15:31:41 -08:00
Josh Rosen 7ed1bf4b48 Add RDD checkpointing to Python API. 2013-01-20 13:19:19 -08:00
Matei Zaharia 86057ec7c8 Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/api/python/PythonRDD.scala
2013-01-20 12:47:55 -08:00
Matei Zaharia 8e7f098a2c Added accumulators to PySpark 2013-01-20 01:57:44 -08:00
Tathagata Das 4f8fe58b25 Merge branch 'mesos-streaming' into streaming
Conflicts:
	core/src/main/scala/spark/api/java/JavaRDDLike.scala
	core/src/main/scala/spark/api/java/JavaSparkContext.scala
	core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Tathagata Das 214345ceac Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29, along with updates to doc comments in SparkContext.checkpoint(). 2013-01-19 23:50:17 -08:00
Imran Rashid d98caa0fa0 Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/main/scala/spark/storage/StorageLevel.scala
2013-01-18 18:11:26 -08:00
Patrick Wendell ee0314c3b3 Merge branch 'streaming' into streaming-java-api 2013-01-17 18:43:00 -08:00
Patrick Wendell d5570c7968 Adding checkpointing to Java API 2013-01-17 18:41:58 -08:00
Matei Zaharia 54c0f9f185 Fix code that assumed spark.local.dir is only a single directory 2013-01-17 17:40:55 -08:00
Fernand Pajot 742bc841ad changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir 2013-01-17 16:56:11 -08:00
Matei Zaharia aff1844155 Merge pull request #381 from squito/remove_threadpool
remove unused thread pool
2013-01-16 16:46:42 -08:00
Tathagata Das f466ee44bc Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/MapOutputTracker.scala
2013-01-16 12:57:11 -08:00
Imran Rashid eae698f755 remove unused thread pool 2013-01-16 12:21:37 -08:00
Tathagata Das a805ac4a7c Disabled checkpoint for PairwiseRDD (pySpark). 2013-01-16 10:55:26 -08:00
Matei Zaharia 4beb084f64 Merge pull request #374 from woggling/null-mapout
Generate FetchFailedException even for cached missing map outputs
2013-01-15 14:22:29 -08:00
Tathagata Das cd1521cfdb Merge branch 'master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CoGroupedRDD.scala
	core/src/main/scala/spark/rdd/FilteredRDD.scala
	docs/_layouts/global.html
	docs/index.md
	run
2013-01-15 12:08:51 -08:00
Charles Reiss 4078623b9f Remove broken attempt to test fetching case. 2013-01-15 12:05:54 -08:00
Stephen Haberman 74d3b23929 Add spark.executor.memory to differentiate executor memory from spark-shell memory. 2013-01-15 14:03:28 -06:00
Stephen Haberman d228bff440 Add a test. 2013-01-15 11:48:50 -06:00
Stephen Haberman dd583b7ebf Call executeOnCompleteCallbacks in a finally block. 2013-01-15 10:52:06 -06:00
Tathagata Das eded21925a Merge pull request #375 from tdas/streaming
Important bug fixes
2013-01-14 23:06:40 -08:00
Charles Reiss b038999797 Fix accidental spark.master.host reuse 2013-01-14 17:04:44 -08:00
Charles Reiss 7ba34bc007 Additional tests for MapOutputTracker. 2013-01-14 15:27:02 -08:00
Charles Reiss 273fb5cc10 Throw FetchFailedException for cached missing locs 2013-01-14 15:26:48 -08:00
Tathagata Das 131be5d62e Fixed bug in RDD checkpointing. 2013-01-14 03:28:25 -08:00
Tathagata Das 82b0cc90ca Merge pull request #370 from tdas/streaming
Added more documentation and minor change in API for NetworkReceiver
2013-01-13 21:28:12 -08:00
Tathagata Das 0dbd411a56 Added documentation for PairDStreamFunctions. 2013-01-13 21:08:35 -08:00
Matei Zaharia cb867e9ffb Merge branch 'master' of github.com:mesos/spark 2013-01-13 19:34:32 -08:00
Matei Zaharia 72408e8dfa Make filter preserve partitioner info, since it can 2013-01-13 19:34:07 -08:00
Matei Zaharia 9a34409810 Merge pull request #360 from rxin/cogroup-java
Changed CoGroupRDD's hash map from Scala to Java.
2013-01-13 15:31:08 -08:00
Reynold Xin be7166146b Removed the use of getOrElse to avoid Scala wrapper for every call. 2013-01-13 15:27:28 -08:00
Ryan LeCompte c31931af7e switch to uppercase constants 2013-01-13 10:39:47 -08:00
Ryan LeCompte 2305a2c1d9 more code cleanup 2013-01-13 10:01:56 -08:00
Mikhail Bautin 88d8f11365 Add missing dependency spray-json to Maven build 2013-01-13 00:46:25 -08:00
Matei Zaharia fbb3fc4143 Merge pull request #346 from JoshRosen/python-api
Python API (PySpark)
2013-01-12 23:49:36 -08:00
Matei Zaharia 01413ca0e7 Merge pull request #364 from tysonjh/master
Executor and JobDescription JSON support added
2013-01-12 16:17:07 -08:00
Matei Zaharia 995075bf79 Merge pull request #355 from shivaram/default-hadoop-pom
Activate hadoop1 profile by default for maven builds
2013-01-12 15:38:36 -08:00
Shivaram Venkataraman bbc56d85ed Rename environment variable for hadoop profiles to hadoopVersion 2013-01-12 15:24:13 -08:00
Ryan LeCompte addff2c466 add comment 2013-01-12 09:57:29 -08:00
Ryan LeCompte ea20ae6618 add one extra test 2013-01-12 09:18:00 -08:00
Ryan LeCompte 2c77eeebb6 correct test params 2013-01-12 00:13:45 -08:00
Ryan LeCompte 0cfea7a2ec add unit test 2013-01-11 23:48:07 -08:00
Ryan LeCompte ff10b3aa09 add missing return 2013-01-11 21:03:57 -08:00
Ryan LeCompte 22445fbea9 attempt to sleep for more accurate time period, minor cleanup 2013-01-11 13:30:49 -08:00
Tyson 1731f1fed4 Added an optional format parameter for individual job queries and optimized the jobId query 2013-01-11 15:01:43 -05:00
Tyson c063e8777e Added implicit json writers for JobDescription and ExecutorRunner 2013-01-11 14:57:38 -05:00
Stephen Haberman 5c7a127219 Pass a new Configuration that wraps the default hadoopConfiguration. 2013-01-11 11:25:11 -06:00
Stephen Haberman 3e6519a36e Use hadoopConfiguration for default JobConf in PairRDDFunctions. 2013-01-11 11:24:20 -06:00
Shivaram Venkataraman 9262522306 Activate hadoop2 profile in pom.xml with -Dhadoop=2 2013-01-10 22:07:34 -08:00
Matei Zaharia 2e914d9983 Formatting 2013-01-10 19:13:08 -08:00
Matei Zaharia 3548c9c0c8 Merge branch 'master' of github.com:mesos/spark 2013-01-10 19:06:40 -08:00
Matei Zaharia 6d1c230281 Merge pull request #357 from tysonjh/master
JSON support added to WebUI
2013-01-10 19:06:07 -08:00
Matei Zaharia 248995c535 Merge pull request #356 from shane-huang/master
Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections
2013-01-10 17:52:23 -08:00
Reynold Xin bd336f5f40 Changed CoGroupRDD's hash map from Scala to Java. 2013-01-10 17:13:04 -08:00
Stephen Haberman d1864052c5 Fix invalid asInstanceOf cast. 2013-01-10 12:16:26 -06:00
Stephen Haberman b15e851279 Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables.
For custom properties, use "spark.hadoop.*" as a prefix instead of just "hadoop.*".
2013-01-10 10:55:41 -06:00
shane-huang 9930a95d21 Modified Patch according to comments 2013-01-10 20:09:55 +08:00
Stephen Haberman e3861ae395 Provide and expose a default Hadoop Configuration.
Any "hadoop.*" system properties will be passed along into configuration.
2013-01-09 17:08:14 -06:00
Tyson 549ee388a1 Removed io.spray spray-json dependency as it is not needed. 2013-01-09 15:12:23 -05:00
Tyson bf9d9946f9 Query parameter reformatted to be more extensible and routing more robust 2013-01-09 11:29:58 -05:00
Tyson 0da2ff102e Added url query parameter json and handler 2013-01-09 10:40:48 -05:00
Tyson 269fe018c7 JSON object definitions 2013-01-09 10:40:43 -05:00
Matei Zaharia 9cc764f523 Code style 2013-01-08 22:29:57 -08:00
Matei Zaharia 14972141f9 Merge pull request #344 from mbautin/log_preferred_hosts
Log preferred hosts
2013-01-08 22:26:34 -08:00
Josh Rosen b57dd0f160 Add mapPartitionsWithSplit() to PySpark. 2013-01-08 16:05:02 -08:00
Stephen Haberman 8ac0f35be4 Add JavaRDDLike.keyBy. 2013-01-08 09:57:45 -06:00
Stephen Haberman 4ee6b22775 Merge branch 'master' into tupleBy
Conflicts:
	core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
shane-huang e4cb72da8a Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections. 2013-01-08 22:40:58 +08:00
Shivaram Venkataraman f7adb382ac Activate hadoop1 if property hadoop is missing. hadoop2 can be activated now
by using -Dhadoop -Phadoop2.
2013-01-08 03:19:43 -08:00
Mikhail Bautin 4725b0f643 Fixing if/else coding style for preferred hosts logging 2013-01-07 20:09:26 -08:00
Mikhail Bautin c41042c816 Log preferred hosts 2013-01-07 20:06:09 -08:00
Shivaram Venkataraman 4bbe07e5ec Activate hadoop1 profile by default for maven builds 2013-01-07 17:46:22 -08:00
Matei Zaharia f7cf035b9b Merge pull request #350 from tdas/streaming
Spark Streaming
2013-01-07 17:40:11 -08:00
Shivaram Venkataraman b1336e2fe4 Update expected size of strings to match our dummy string class 2013-01-07 17:00:32 -08:00
Tathagata Das 4719e6d8fe Changed locations for unit test logs. 2013-01-07 16:06:07 -08:00
Shivaram Venkataraman 55c66d365f Use a dummy string class in Size Estimator tests to make it resistant to jdk
versions
2013-01-07 15:58:00 -08:00
Shivaram Venkataraman 77d751731c Remove unused BoundedMemoryCache file and associated test case. 2013-01-07 15:57:46 -08:00
Shivaram Venkataraman aed368a970 Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also
fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop
versions
2013-01-07 15:57:33 -08:00
Shivaram Venkataraman f8d579a0c0 Remove dependencies on sun jvm classes. Instead use reflection to infer
HotSpot options and total physical memory size
2013-01-07 15:57:18 -08:00
Tathagata Das 3b0a3b89ac Added better docs for RDDCheckpointData 2013-01-07 14:55:49 -08:00
Tathagata Das 237bac36e9 Renamed examples and added documentation. 2013-01-07 14:37:21 -08:00
Matei Zaharia 1941d9602d Merge branch 'master' of github.com:mesos/spark 2013-01-07 16:50:39 -05:00
Matei Zaharia 9c32f300fb Add Accumulable.setValue for easier use in Java 2013-01-07 16:50:23 -05:00
Tathagata Das 1346126485 Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet. 2013-01-07 12:11:27 -08:00
Stephen Haberman 8dc06069fe Rename RDD.tupleBy to keyBy. 2013-01-06 15:21:45 -06:00
Matei Zaharia 8fd3a70c18 Add PairRDD.keys() and values() to Java API 2013-01-05 22:46:45 -05:00
Matei Zaharia b1663752c6 Merge pull request #351 from stephenh/values
Add PairRDDFunctions.keys and values.
2013-01-05 19:15:54 -08:00
Matei Zaharia 0982572519 Add methods called just 'accumulator' for int/double in Java API 2013-01-05 22:11:28 -05:00
Matei Zaharia 86af64b0a6 Fix Accumulators in Java, and add a test for them 2013-01-05 20:55:17 -05:00
Matei Zaharia ecf9c08901 Fix Accumulators in Java, and add a test for them 2013-01-05 20:54:08 -05:00
Stephen Haberman 1fdb6946b5 Add RDD.tupleBy. 2013-01-05 13:07:59 -06:00
Stephen Haberman 6a0db3b449 Fix typo. 2013-01-05 12:56:17 -06:00
Stephen Haberman f4e6b9361f Add RDD.collect(PartialFunction). 2013-01-05 12:14:08 -06:00
Stephen Haberman 8d57c78c83 Add PairRDDFunctions.keys and values. 2013-01-05 12:04:01 -06:00
Josh Rosen 33beba3965 Change PySpark RDD.take() to not call iterator(). 2013-01-03 14:52:21 -08:00
Tathagata Das 3dc87dd923 Fixed compilation bug in RDDSuite created during merge for mesos/master. 2013-01-01 16:38:04 -08:00
Tathagata Das d34dba25c2 Merge branch 'mesos' into dev-merge 2013-01-01 15:48:39 -08:00
Josh Rosen b58340dbd9 Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
Josh Rosen 170e451fbd Minor documentation and style fixes for PySpark. 2013-01-01 13:52:14 -08:00
Matei Zaharia 55809fbc6d Merge pull request #349 from woggling/cache-finally
Avoid stalls when computation of cached RDD throws exception
2013-01-01 08:21:33 -08:00
Charles Reiss 58072a7340 Remove some dead comments 2013-01-01 08:07:44 -08:00
Charles Reiss 21636ee4fa Test with exception while computing cached RDD. 2013-01-01 08:07:40 -08:00
Charles Reiss feadaf72f4 Mark key as not loading in CacheTracker even when compute() fails 2013-01-01 07:57:20 -08:00
Josh Rosen f803953998 Raise exception when hashing Java arrays (SPARK-597) 2012-12-31 20:20:11 -08:00
Tathagata Das 7e0271b438 Refactored a whole lot to push all DStreams into the spark.streaming.dstream package. 2012-12-30 15:19:55 -08:00