Commit graph

1272 commits

Author SHA1 Message Date
Mark Hamstra 857010392b Fuller implementation of foldByKey 2013-03-15 10:56:05 -07:00
Mark Hamstra 16a4ca4537 restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test 2013-03-14 13:58:37 -07:00
Mark Hamstra b1422cbdd5 added foldByKey 2013-03-14 12:59:58 -07:00
Stephen Haberman 7786881f47 Fix tabs that snuck in. 2013-03-14 14:57:12 -05:00
Stephen Haberman 7d8bb4df3a Allow subtractByKey's other argument to have a different value type. 2013-03-14 14:44:15 -05:00
Stephen Haberman 4632c45af1 Finished subtractByKeys. 2013-03-14 10:35:34 -05:00
Matei Zaharia 4032beba49 Merge pull request #521 from stephenh/earlyclose
Close the reader in HadoopRDD as soon as iteration end.
2013-03-13 19:29:46 -07:00
Stephen Haberman 63fe225587 Simplify SubtractedRDD in preparation from subtractByKey. 2013-03-13 17:17:34 -05:00
Mark Hamstra cd5b947cf6 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-13 13:16:14 -07:00
Stephen Haberman e7f1a69c6b Add a test for NextIterator. 2013-03-13 10:46:33 -05:00
Stephen Haberman 1a175d13b9 Add NextIterator.closeIfNeeded. 2013-03-13 10:17:39 -05:00
Stephen Haberman 8f00d23598 Remove NextIterator.close default implementation. 2013-03-12 12:30:10 -05:00
Stephen Haberman 9e68f48625 More quickly call close in HadoopRDD.
This also refactors out the common "gotNext" iterator pattern into
a shared utility class.
2013-03-11 23:59:17 -05:00
Charles Reiss 769d399674 Send block sizes as longs. 2013-03-11 14:17:05 -07:00
Mark Hamstra 562893bea3 deleted excess curly braces 2013-03-10 22:43:08 -07:00
Imran Rashid 8a11ac3dc7 increase sleep time 2013-03-10 22:31:44 -07:00
Imran Rashid 9f97f2f9d8 add a small wait to one task to make sure some task runtime really is non-zero 2013-03-10 22:30:18 -07:00
Mark Hamstra 1289e7176b refactored _With API and added foreachPartition 2013-03-10 22:27:13 -07:00
Mark Hamstra b57df1f5e3 Merge branch 'master' of https://github.com/mesos/spark into WithThing 2013-03-10 16:56:31 -07:00
Matei Zaharia 2e1bbc4e7e Merge remote-tracking branch 'woggling/dag-sched-driver-port'
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 16:52:54 -07:00
Matei Zaharia 91a9d093bd Merge pull request #512 from patelh/fix-kryo-serializer
Fix reference bug in Kryo serializer, add test, update version
2013-03-10 15:48:23 -07:00
Matei Zaharia 557cfd0f4d Merge pull request #515 from woggling/deploy-app-death
Notify standalone deploy client of application death.
2013-03-10 15:44:57 -07:00
Matei Zaharia a59cc6060f Merge remote-tracking branch 'stephenh/nomocks'
Conflicts:
	core/src/main/scala/spark/storage/BlockManagerMaster.scala
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-03-10 13:39:10 -07:00
Imran Rashid 20f01a0a1b enable task metrics in local mode, add tests 2013-03-09 21:17:31 -08:00
Imran Rashid ec30188a2a rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches 2013-03-09 21:16:53 -08:00
Charles Reiss b0983c5762 Notify standalone deploy client of application death.
Usually, this isn't necessary since the application will be removed
as a result of the deploy client disconnecting, but occassionally, the
standalone deploy master removes an application otherwise.

Also mark applications as FAILED instead of FINISHED when they are
killed as a result of their executors failing too many times.
2013-03-09 11:29:45 -08:00
Charles Reiss d0216cb38b Prevent DAGSchedulerSuite from corrupting driver.port.
Use the LocalSparkContext abstraction to properly manage clearing
spark.driver.port.
2013-03-09 10:49:02 -08:00
Hiral Patel 664e5fd24b Fix reference bug in Kryo serializer, add test, update version 2013-03-07 22:16:11 -08:00
Mark Hamstra 5ff0810b11 refactor mapWith, flatMapWith and filterWith to each use two parameter lists 2013-03-05 12:25:44 -08:00
Mark Hamstra d046d8ad32 whitespace formatting 2013-03-05 00:48:13 -08:00
Mark Hamstra 9148b968cf mapWith, flatMapWith and filterWith 2013-03-04 15:48:47 -08:00
Matei Zaharia 9f0dc829cb Fix TaskMetrics not being serializable 2013-03-04 12:08:31 -08:00
Matei Zaharia 04fb81ffe5 Merge pull request #506 from rxin/spark-706
Fixed SPARK-706: Failures in block manager put leads to read task hanging.
2013-03-03 17:20:07 -08:00
Imran Rashid 0bd1d00c2a minor cleanup based on feedback in review request 2013-03-03 16:46:45 -08:00
Imran Rashid f1006b99ff change CleanupIterator to CompletionIterator 2013-03-03 16:39:05 -08:00
Imran Rashid 8fef5b9c5f refactoring of TaskMetrics 2013-03-03 16:34:04 -08:00
Imran Rashid d36abdb053 Merge branch 'master' into stageInfo 2013-03-03 15:20:46 -08:00
Matei Zaharia 6bfc7cad6b Merge pull request #504 from mosharaf/master
Worker address was getting removed when removing an app.
2013-03-02 22:14:49 -08:00
Mark Hamstra 8b06b359da bump version to 0.7.1-SNAPSHOT in the subproject poms to keep the maven build building. 2013-02-28 23:34:34 -08:00
Reynold Xin 44134e12bb Fixed SPARK-706: Failures in block manager put leads to read task
hanging.
2013-02-28 15:14:59 -08:00
Stephen Haberman 6415c2bb60 Don't create the Executor until we have everything it needs. 2013-02-28 12:38:09 -06:00
Stephen Haberman 80eecd2cb1 Make Executor fields volatile since they're read from the thread pool. 2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury 4ab387bcdb Fixed master datastructure updates after removing an application; and a typo. 2013-02-27 13:52:44 -08:00
Matei Zaharia ece3edfffa Fix a problem with no hosts being counted as alive in the first job 2013-02-26 12:11:03 -08:00
Matei Zaharia 73697e2891 Fix overly large thread names in PySpark 2013-02-26 12:07:59 -08:00
Stephen Haberman db957e5bd7 Fix MapOutputTrackerSuite. 2013-02-26 01:38:50 -06:00
Stephen Haberman a65aa549ff Override DAGScheduler.runLocally so we can remove the Thread.sleep. 2013-02-25 23:49:32 -06:00
Stephen Haberman a4adeb255c Merge branch 'master' into nomocks
Conflicts:
	core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala
2013-02-25 23:48:52 -06:00
Tathagata Das c02e064938 Fixed replication bug in BlockManager 2013-02-25 17:27:46 -08:00
Matei Zaharia 490f056cdd Allow passing sparkHome and JARs to StreamingContext constructor
Also warns if spark.cleaner.ttl is not set in the version where you pass
your own SparkContext.
2013-02-25 15:13:30 -08:00
Matei Zaharia 568bdaf8ae Set spark.deploy.spreadOut to true by default in 0.7 (improves locality) 2013-02-25 14:34:55 -08:00
Matei Zaharia 1ef58dadcc Add a config property for Akka lifecycle event logging 2013-02-25 14:01:24 -08:00
Matei Zaharia ceaec4a675 Merge pull request #498 from pwendell/shutup-akka
Disable remote lifecycle logging from Akka.
2013-02-25 12:31:24 -08:00
Patrick Wendell 85a85646d9 Disable remote lifecycle logging from Akka.
This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.
2013-02-25 12:25:43 -08:00
Imran Rashid 8f17387d97 remove bogus comment 2013-02-25 10:31:06 -08:00
Matei Zaharia 6ae9a22c3e Get spark.default.paralellism on each call to defaultPartitioner,
instead of only once, in case the user changes it across Spark uses
2013-02-25 10:28:08 -08:00
Matei Zaharia d6e6abece3 Merge pull request #459 from stephenh/bettersplits
Change defaultPartitioner to use upstream split size.
2013-02-25 09:22:04 -08:00
Stephen Haberman c44ccf2862 Use default parallelism if its set. 2013-02-24 23:54:03 -06:00
Stephen Haberman 44032bc476 Merge branch 'master' into bettersplits
Conflicts:
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
	core/src/test/scala/spark/ShuffleSuite.scala
2013-02-24 22:08:14 -06:00
Tathagata Das dff53d1b94 Merge branch 'mesos-master' into streaming 2013-02-24 12:17:22 -08:00
Matei Zaharia 3b9f929467 Merge pull request #468 from haitaoyao/master
support customized java options for Master, Worker, Executor, and Repl
2013-02-23 23:38:15 -08:00
Stephen Haberman 37c7a71f9c Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD. 2013-02-24 00:27:53 -06:00
Stephen Haberman f442e7d83c Update for split->partition rename. 2013-02-24 00:27:14 -06:00
Stephen Haberman cec87a0653 Merge branch 'master' into subtract 2013-02-23 23:27:55 -06:00
Tathagata Das d853aa9658 Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs. 2013-02-23 17:42:26 -08:00
Patrick Wendell 931f439be9 Responding to code review 2013-02-23 15:40:41 -08:00
Patrick Wendell f51b0f93f2 Adding Java-accessible methods to Vector.scala
This is needed for the Strata machine learning tutorial (and
also is generally helpful).
2013-02-23 13:26:59 -08:00
Matei Zaharia d942d39072 Handle exceptions in RecordReader.close() better (suggested by Jim
Donahue)
2013-02-23 11:19:07 -08:00
Matei Zaharia c89824046a Merge pull request #490 from woggling/conn-death
Detect when SendingConnections disconnect even if we aren't sending to them
2013-02-22 22:58:19 -08:00
Charles Reiss 50cf8c8b79 Add fault tolerance test that uses replicated RDDs. 2013-02-22 16:11:53 -08:00
Charles Reiss c8a7886921 Detect when SendingConnections drop by trying to read them.
Comment fix
2013-02-22 16:11:52 -08:00
Matei Zaharia d4d7993bf5 Several fixes to the work to log when no resources can be used by a job.
Fixed some of the messages as well as code style.
2013-02-22 15:51:37 -08:00
Matei Zaharia f33662c133 Merge remote-tracking branch 'pwendell/starvation-check'
Also fixed a bug where master was offering executors on dead workers

Conflicts:
	core/src/main/scala/spark/deploy/master/Master.scala
2013-02-22 15:27:41 -08:00
Matei Zaharia 7341de0d48 Merge pull request #475 from JoshRosen/spark-668
Remove hack workaround for SPARK-668
2013-02-22 14:56:18 -08:00
Patrick Wendell f8c3a03d55 SPARK-702: Replace Function --> JFunction in JavaAPI Suite.
In a few places the Scala (rather than Java) function class is used.
2013-02-22 12:54:15 -08:00
Imran Rashid 0f37b43b40 make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD 2013-02-21 16:56:28 -08:00
Imran Rashid 9230617f23 add cleanup iterator 2013-02-21 16:55:14 -08:00
Imran Rashid 81bd07da26 sparkListeners should be a val 2013-02-21 15:21:45 -08:00
Imran Rashid 796e934d31 add some docs & some cleanup 2013-02-21 15:19:34 -08:00
Imran Rashid 394d3acc3e store taskInfo & metrics together in a tuple 2013-02-21 15:19:34 -08:00
Imran Rashid 7960927cf4 get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo 2013-02-21 15:19:34 -08:00
Imran Rashid d0bfac3eed taskInfo tracks if a task is run on a preferred host 2013-02-21 15:19:34 -08:00
Imran Rashid 6f62a57858 add runtime breakdowns 2013-02-21 15:19:34 -08:00
Imran Rashid 176cb20703 add task result size; better formatting for time interval distributions; cleanup distribution formatting 2013-02-21 15:19:33 -08:00
Imran Rashid f2fcabf2ea add timing around parts of executor & track result size 2013-02-21 15:19:33 -08:00
Imran Rashid ff127cfcd3 Merge branch 'master' into stageInfo
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/storage/BlockManager.scala
2013-02-21 15:16:21 -08:00
Imran Rashid 69f9a7035f fully revert change to addOnCompleteCallback -- missed this in e9f53ec 2013-02-21 15:07:46 -08:00
Imran Rashid baab23abdf TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task 2013-02-21 14:13:01 -08:00
haitao.yao 8215b95547 Merge branch 'mesos' 2013-02-21 10:07:24 +08:00
Tathagata Das 334ab92441 Fixed bug in CheckpointSuite 2013-02-20 10:26:36 -08:00
Tathagata Das 1cb725e417 Merge branch 'mesos-master' into streaming 2013-02-20 09:55:35 -08:00
Tathagata Das fb9956256d Merge branch 'mesos-master' into streaming
Conflicts:
	core/src/main/scala/spark/rdd/CheckpointRDD.scala
	streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala
2013-02-20 09:01:29 -08:00
Matei Zaharia 05bc02e80b Merge pull request #482 from woggling/shutdown-exceptions
Don't call System.exit over uncaught exceptions from shutdown hooks
2013-02-19 20:56:15 -08:00
haitao.yao 6a3d44c673 Merge branch 'mesos' 2013-02-20 10:23:58 +08:00
Charles Reiss 092c631fa8 Pull detection of being in a shutdown hook into utility function. 2013-02-19 17:49:55 -08:00
Reynold Xin 130f704baf Added a method to create PartitionPruningRDD. 2013-02-19 16:03:52 -08:00
Charles Reiss d0588bd6d7 Catch/log errors deleting temp dirs 2013-02-19 13:04:06 -08:00
Charles Reiss 687581c3ec Paranoid uncaught exception handling for exceptions during shutdown 2013-02-19 13:03:02 -08:00
haitao.yao 7c129388fb Merge branch 'mesos' 2013-02-19 11:22:24 +08:00
Matei Zaharia 7151e1e4c8 Rename "jobs" to "applications" in the standalone cluster 2013-02-17 23:23:08 -08:00