ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Judy	c5ba975ee8	[Spark-5708] Add Slf4jSink to Spark Metrics Add Slf4jSink to Spark Metrics using Coda Hale's SlfjReporter. This sends metrics to log4j, allowing spark users to reuse log4j pipeline for metrics collection. Reviewed existing unit tests and didn't see any sink-related tests. Please advise on if tests should be added. Author: Judy <judynash@microsoft.com> Author: judynash <judynash@microsoft.com> Closes #4644 from judynash/master and squashes the following commits: 57ef214 [judynash] doc clarification and indent fixes a751a66 [Judy] Spark-5708: Add Slf4jSink to Spark Metrics	2015-02-24 20:50:16 +00:00
Andrew Or	6d2caa576f	[SPARK-5965] Standalone Worker UI displays {{USER_JAR}} For screenshot see: https://issues.apache.org/jira/browse/SPARK-5965 This was caused by `20a6013106`. Author: Andrew Or <andrew@databricks.com> Closes #4739 from andrewor14/user-jar-blocker and squashes the following commits: 23c4a9e [Andrew Or] Use right argument	2015-02-24 11:08:07 -08:00
Tathagata Das	64d2c01ff1	[Spark-5967] [UI] Correctly clean JobProgressListener.stageIdToActiveJobIds Patch should be self-explanatory pwendell JoshRosen Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #4741 from tdas/SPARK-5967 and squashes the following commits: 653b5bb [Tathagata Das] Fixed the fix and added test e2de972 [Tathagata Das] Clear stages which have no corresponding active jobs.	2015-02-24 11:02:47 -08:00
CodingCat	242d49584c	[SPARK-5724] fix the misconfiguration in AkkaUtils https://issues.apache.org/jira/browse/SPARK-5724 In AkkaUtil, we set several failure detector related the parameters as following ``` al akkaConf = ConfigFactory.parseMap(conf.getAkkaConf.toMap[String, String]) .withFallback(akkaSslConfig).withFallback(ConfigFactory.parseString( s""" \|akka.daemonic = on \|akka.loggers = [""akka.event.slf4j.Slf4jLogger""] \|akka.stdout-loglevel = "ERROR" \|akka.jvm-exit-on-fatal-error = off \|akka.remote.require-cookie = "$requireCookie" \|akka.remote.secure-cookie = "$secureCookie" \|akka.remote.transport-failure-detector.heartbeat-interval = $akkaHeartBeatInterval s \|akka.remote.transport-failure-detector.acceptable-heartbeat-pause = $akkaHeartBeatPauses s \|akka.remote.transport-failure-detector.threshold = $akkaFailureDetector \|akka.actor.provider = "akka.remote.RemoteActorRefProvider" \|akka.remote.netty.tcp.transport-class = "akka.remote.transport.netty.NettyTransport" \|akka.remote.netty.tcp.hostname = "$host" \|akka.remote.netty.tcp.port = $port \|akka.remote.netty.tcp.tcp-nodelay = on \|akka.remote.netty.tcp.connection-timeout = $akkaTimeout s \|akka.remote.netty.tcp.maximum-frame-size = ${akkaFrameSize}B \|akka.remote.netty.tcp.execution-pool-size = $akkaThreads \|akka.actor.default-dispatcher.throughput = $akkaBatchSize \|akka.log-config-on-start = $logAkkaConfig \|akka.remote.log-remote-lifecycle-events = $lifecycleEvents \|akka.log-dead-letters = $lifecycleEvents \|akka.log-dead-letters-during-shutdown = $lifecycleEvents """.stripMargin)) ``` Actually, we do not have any parameter naming "akka.remote.transport-failure-detector.threshold" see: http://doc.akka.io/docs/akka/2.3.4/general/configuration.html what we have is "akka.remote.watch-failure-detector.threshold" Author: CodingCat <zhunansjtu@gmail.com> Closes #4512 from CodingCat/SPARK-5724 and squashes the following commits: bafe56e [CodingCat] fix the grammar in configuration doc 338296e [CodingCat] remove failure-detector related info 8bfcfd4 [CodingCat] fix the misconfiguration in AkkaUtils	2015-02-23 11:29:25 +00:00
Ilya Ganelin	95cd643aa9	[SPARK-3885] Provide mechanism to remove accumulators once they are no longer used Instead of storing a strong reference to accumulators, I've replaced this with a weak reference and updated any code that uses these accumulators to check whether the reference resolves before using the accumulator. A weak reference will be cleared when there is no longer an existing copy of the variable versus using a soft reference in which case accumulators would only be cleared when the GC explicitly ran out of memory. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4021 from ilganeli/SPARK-3885 and squashes the following commits: 4ba9575 [Ilya Ganelin] Fixed error in test suite 8510943 [Ilya Ganelin] Extra code bb76ef0 [Ilya Ganelin] File deleted somehow 283a333 [Ilya Ganelin] Added cleanup method for accumulators to remove stale references within Accumulators.original to accumulators that are now out of scope 345fd4f [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885 7485a82 [Ilya Ganelin] Fixed build error c8e0f2b [Ilya Ganelin] Added working test for accumulator garbage collection 94ce754 [Ilya Ganelin] Still not being properly garbage collected 8722b63 [Ilya Ganelin] Fixing gc test 7414a9c [Ilya Ganelin] Added test for accumulator garbage collection 18d62ec [Ilya Ganelin] Updated to throw Exception when accessing a GCd accumulator 9a81928 [Ilya Ganelin] Reverting permissions changes 28f705c [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885 b820ab4b [Ilya Ganelin] reset d78f4bf [Ilya Ganelin] Removed obsolete comment 0746e61 [Ilya Ganelin] Updated DAGSchedulerSUite to fix bug 3350852 [Ilya Ganelin] Updated DAGScheduler and Suite to correctly use new implementation of WeakRef Accumulator storage c49066a [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885 cbb9023 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3885 a77d11b [Ilya Ganelin] Updated Accumulators class to store weak references instead of strong references to allow garbage collection of old accumulators	2015-02-22 22:57:26 -08:00
Aaron Josephs	e4f9d03d72	[SPARK-911] allow efficient queries for a range if RDD is partitioned wi... ...th RangePartitioner Author: Aaron Josephs <ajoseph4@binghamton.edu> Closes #1381 from aaronjosephs/PLAT-911 and squashes the following commits: e30ade5 [Aaron Josephs] [SPARK-911] allow efficient queries for a range if RDD is partitioned with RangePartitioner	2015-02-22 22:09:06 -08:00
Evan Yu	7683982faf	[SPARK-5860][CORE] JdbcRDD: overflow on large range with high number of partitions Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT ids Author: Evan Yu <ehotou@gmail.com> Closes #4701 from hotou/SPARK-5860 and squashes the following commits: 9e038d1 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level 7883ad9 [Evan Yu] [SPARK-5860][CORE] Prevent overflowing at the length level c88755a [Evan Yu] [SPARK-5860][CORE] switch to BigInt instead of BigDecimal 4e9ff4f [Evan Yu] [SPARK-5860][CORE] JdbcRDD overflow on large range with high number of partitions	2015-02-21 20:40:21 +00:00
Nishkam Ravi	d3cbd38c33	SPARK-5841 [CORE] [HOTFIX 2] Memory leak in DiskBlockManager Continue to see IllegalStateException in YARN cluster mode. Adding a simple workaround for now. Author: Nishkam Ravi <nravi@cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Closes #4690 from nishkamravi2/master_nravi and squashes the following commits: d453197 [nishkamravi2] Update NewHadoopRDD.scala 6f41a1d [nishkamravi2] Update NewHadoopRDD.scala 0ce2c32 [nishkamravi2] Update HadoopRDD.scala f7e33c2 [Nishkam Ravi] Merge branch 'master_nravi' of https://github.com/nishkamravi2/spark into master_nravi ba1eb8b [Nishkam Ravi] Try-catch block around the two occurrences of removeShutDownHook. Deletion of semi-redundant occurrences of expensive operation inShutDown. 71d0e17 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 494d8c0 [nishkamravi2] Update DiskBlockManager.scala 3c5ddba [nishkamravi2] Update DiskBlockManager.scala f0d12de [Nishkam Ravi] Workaround for IllegalStateException caused by recent changes to BlockManager.stop 79ea8b4 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi b446edc [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala 535295a [nishkamravi2] Update TaskSetManager.scala 3e1b616 [Nishkam Ravi] Modify test for maxResultSize 9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0) 5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles	2015-02-21 09:59:28 -08:00
Sean Owen	d3dfebebce	SPARK-5744 [CORE] Take 2. RDD.isEmpty / take fails for (empty) RDD of Nothing Follow-on to https://github.com/apache/spark/pull/4591 Document isEmpty / take / parallelize and their interaction with (an empty) RDD[Nothing] and RDD[Null]. Also, fix a marginally related minor issue with histogram() and EmptyRDD. CC rxin since you reviewed the last one although I imagine this is an uncontroversial resolution. Author: Sean Owen <sowen@cloudera.com> Closes #4698 from srowen/SPARK-5744.2 and squashes the following commits: 9b2a811 [Sean Owen] 2 extra javadoc fixes d1b9fba [Sean Owen] Document isEmpty / take / parallelize and their interaction with (an empty) RDD[Nothing] and RDD[Null]. Also, fix a marginally related minor issue with histogram() and EmptyRDD.	2015-02-20 10:21:39 +00:00
mcheah	3be92cdac3	[SPARK-4808] Removing minimum number of elements read before spill check In the general case, Spillable's heuristic of checking for memory stress on every 32nd item after 1000 items are read is good enough. In general, we do not want to be enacting the spilling checks until later on in the job; checking for disk-spilling too early can produce unacceptable performance impact in trivial cases. However, there are non-trivial cases, particularly if each serialized object is large, where checking for the necessity to spill too late would allow the memory to overflow. Consider if every item is 1.5 MB in size, and the heap size is 1000 MB. Then clearly if we only try to spill the in-memory contents to disk after 1000 items are read, we would have already accumulated 1500 MB of RAM and overflowed the heap. Patch #3656 attempted to circumvent this by checking the need to spill on every single item read, but that would cause unacceptable performance in the general case. However, the convoluted cases above should not be forced to be refactored to shrink the data items. Therefore it makes sense that the memory spilling thresholds be configurable. Author: mcheah <mcheah@palantir.com> Closes #4420 from mingyukim/memory-spill-configurable and squashes the following commits: 6e2509f [mcheah] [SPARK-4808] Removing minimum number of elements read before spill check	2015-02-19 18:09:22 -08:00
Sean Owen	34b7c35380	SPARK-4682 [CORE] Consolidate various 'Clock' classes Another one from JoshRosen 's wish list. The first commit is much smaller and removes 2 of the 4 Clock classes. The second is much larger, necessary for consolidating the streaming one. I put together implementations in the way that seemed simplest. Almost all the change is standardizing class and method names. Author: Sean Owen <sowen@cloudera.com> Closes #4514 from srowen/SPARK-4682 and squashes the following commits: 5ed3a03 [Sean Owen] Javadoc Clock classes; make ManualClock private[spark] 169dd13 [Sean Owen] Add support for legacy org.apache.spark.streaming clock class names 277785a [Sean Owen] Reduce the net change in this patch by reversing some unnecessary syntax changes along the way b5e53df [Sean Owen] FakeClock -> ManualClock; getTime() -> getTimeMillis() 160863a [Sean Owen] Consolidate Streaming Clock class into common util Clock 7c956b2 [Sean Owen] Consolidate Clocks except for Streaming Clock	2015-02-19 15:35:23 -08:00
zsxwing	90095bf3ce	[SPARK-5423][Core] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file This PR adds a `finalize` method in DiskMapIterator to clean up the resources even if some exception happens during processing data. Author: zsxwing <zsxwing@gmail.com> Closes #4219 from zsxwing/SPARK-5423 and squashes the following commits: d4b2ca6 [zsxwing] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file	2015-02-19 18:37:31 +00:00
Andrew Or	38e624a732	[SPARK-5816] Add huge compatibility warning in DriverWrapper The stability of the new submission gateway assumes that the arguments in `DriverWrapper` are consistent across multiple Spark versions. However, this is not at all clear from the code itself. In fact, this was broken in `20a6013106`, which is fortunately OK because both that commit and the original commit that added this gateway are part of the same release. To prevent this from happening again we should at the very least add a huge warning where appropriate. Author: Andrew Or <andrew@databricks.com> Closes #4687 from andrewor14/driver-wrapper-warning and squashes the following commits: 7989b56 [Andrew Or] Add huge compatibility warning	2015-02-19 09:56:25 -08:00
Jacek Lewandowski	fb87f44921	SPARK-5548: Fix for AkkaUtilsSuite failure - attempt 2 Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4653 from jacek-lewandowski/SPARK-5548-2-master and squashes the following commits: 0e199b6 [Jacek Lewandowski] SPARK-5548: applied reviewer's comments 843eafb [Jacek Lewandowski] SPARK-5548: Fix for AkkaUtilsSuite failure - attempt 2	2015-02-19 09:53:36 -08:00
Kousuke Saruta	82197ed3bd	[SPARK-4949]shutdownCallback in SparkDeploySchedulerBackend should be enclosed by synchronized block. A variable `shutdownCallback` in SparkDeploySchedulerBackend can be accessed from multiple threads so it should be enclosed by synchronized block. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3781 from sarutak/SPARK-4949 and squashes the following commits: c146c93 [Kousuke Saruta] Removed "setShutdownCallback" method c7265dc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949 42ca528 [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference 552df7c [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference f556819 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949 1b60fd1 [Kousuke Saruta] Improved the locking logics 5942765 [Kousuke Saruta] Enclosed shutdownCallback in SparkDeploySchedulerBackend by synchronized block	2015-02-18 12:20:11 +00:00
Josh Rosen	a51fc7ef9a	[SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs() This method is performance-sensitive and this change wasn't necessary.	2015-02-17 17:45:16 -08:00
Josh Rosen	d46d6246d2	[SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map. This map is accessed by the `getCacheLocs` and `clearCacheLocs()` methods, which can be called by separate threads, since DAGScheduler's `getPreferredLocs()` method is called by SparkContext and indirectly calls `getCacheLocs()`. If this map is cleared by the DAGScheduler event processing thread while a user thread is submitting a job and computing preferred locations, then this can cause the user thread to throw "NoSuchElementException: key not found" errors. Most accesses to DAGScheduler's internal state do not need synchronization because that state is only accessed from the event processing loop's thread. An alternative approach to fixing this bug would be to refactor this code so that SparkContext sends the DAGScheduler a message in order to get the list of preferred locations. However, this would involve more extensive changes to this code and would be significantly harder to backport to maintenance branches since some of the related code has undergone significant refactoring (e.g. the introduction of EventLoop). Since `cacheLocs` is the only state that's accessed in this way, adding simple synchronization seems like a better short-term fix. See #3345 for additional context. Author: Josh Rosen <joshrosen@databricks.com> Closes #4660 from JoshRosen/SPARK-4454 and squashes the following commits: 12d64ba [Josh Rosen] Properly synchronize accesses to DAGScheduler cacheLocs map.	2015-02-17 17:39:58 -08:00
Burak Yavuz	ae6cfb3acd	[SPARK-5811] Added documentation for maven coordinates and added Spark Packages support Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages` Author: Burak Yavuz <brkyvz@gmail.com> Author: Davies Liu <davies@databricks.com> Closes #4662 from brkyvz/SPARK-5811 and squashes the following commits: 56ccccd [Burak Yavuz] fixed broken test 64cb8ee [Burak Yavuz] passed pep8 on local c07b81e [Burak Yavuz] fixed pep8 a8bd6b7 [Burak Yavuz] submit PR 4ef4046 [Burak Yavuz] ready for PR 8fb02e5 [Burak Yavuz] merged master 25c9b9f [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into python-jar 560d13b [Burak Yavuz] before PR 17d3f76 [Davies Liu] support .jar as python package a3eb717 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5811 c60156d [Burak Yavuz] [SPARK-5811] Added documentation for maven coordinates	2015-02-17 17:23:22 -08:00
Davies Liu	c3d2b90bde	[SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark Currently, PySpark does not support narrow dependency during cogroup/join when the two RDDs have the partitioner, another unnecessary shuffle stage will come in. The Python implementation of cogroup/join is different than Scala one, it depends on union() and partitionBy(). This patch will try to use PartitionerAwareUnionRDD() in union(), when all the RDDs have the same partitioner. It also fix `reservePartitioner` in all the map() or mapPartitions(), then partitionBy() can skip the unnecessary shuffle stage. Author: Davies Liu <davies@databricks.com> Closes #4629 from davies/narrow and squashes the following commits: dffe34e [Davies Liu] improve test, check number of stages for join/cogroup 1ed3ba2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into narrow 4d29932 [Davies Liu] address comment cc28d97 [Davies Liu] add unit tests 940245e [Davies Liu] address comments ff5a0a6 [Davies Liu] skip the partitionBy() on Python side eb26c62 [Davies Liu] narrow dependency in PySpark	2015-02-17 16:54:57 -08:00
Davies Liu	445a755b88	[SPARK-4172] [PySpark] Progress API in Python This patch bring the pull based progress API into Python, also a example in Python. Author: Davies Liu <davies@databricks.com> Closes #3027 from davies/progress_api and squashes the following commits: b1ba984 [Davies Liu] fix style d3b9253 [Davies Liu] add tests, mute the exception after stop 4297327 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api 969fa9d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api 25590c9 [Davies Liu] update with Java API 360de2d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api c0f1021 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api 023afb3 [Davies Liu] add Python API and example for progress API	2015-02-17 13:36:43 -08:00
Sean Owen	49c19fdbad	SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager Avoid call to remove shutdown hook being called from shutdown hook CC pwendell JoshRosen MattWhelan Author: Sean Owen <sowen@cloudera.com> Closes #4648 from srowen/SPARK-5841.2 and squashes the following commits: 51548db [Sean Owen] Avoid call to remove shutdown hook being called from shutdown hook	2015-02-17 19:40:06 +00:00
xukun 00228947	b271c265b7	[SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function. Author: xukun 00228947 <xukun.xu@huawei.com> Author: viper-kun <xukun.xu@huawei.com> Closes #4418 from viper-kun/deleteunusedfun and squashes the following commits: 87340eb [viper-kun] fix style 3d6c69e [xukun 00228947] fix bug 2bc397e [xukun 00228947] deleteunusedfun	2015-02-17 18:59:41 +00:00
Ryan Williams	d8f69cf788	[SPARK-5778] throw if nonexistent metrics config file provided previous behavior was to log an error; this is fine in the general case where no `spark.metrics.conf` parameter was specified, in which case a default `metrics.properties` is looked for, and the execption logged and suppressed if it doesn't exist. if the user has purposefully specified a metrics.conf file, however, it makes more sense to show them an error when said file doesn't exist. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4571 from ryan-williams/metrics and squashes the following commits: 5bccb14 [Ryan Williams] private-ize some MetricsConfig members 08ff998 [Ryan Williams] rename METRICS_CONF: DEFAULT_METRICS_CONF_FILENAME f4d7fab [Ryan Williams] fix tests ad24b0e [Ryan Williams] add "metrics.properties" to .rat-excludes 94e810b [Ryan Williams] throw if nonexistent Sink class is specified 31d2c30 [Ryan Williams] metrics code review feedback 56287db [Ryan Williams] throw if nonexistent metrics config file provided	2015-02-17 10:57:16 -08:00
Josh Rosen	ee6e3eff02	Revert "[SPARK-5363] [PySpark] check ending mark in non-block way" This reverts commits `ac6fe67e1d` and `c06e42f2c1`.	2015-02-17 07:49:02 -08:00
Patrick Wendell	c06e42f2c1	HOTFIX: Style issue causing build break Caused by #4601	2015-02-16 22:11:28 -08:00
Davies Liu	ac6fe67e1d	[SPARK-5363] [PySpark] check ending mark in non-block way There is chance of dead lock that the Python process is waiting for ending mark from JVM, but which is eaten by corrupted stream. This PR checks the ending mark from Python in non-block way, so it will not blocked by Python process. There is a small chance that the ending mark is sent by Python process but not available right now, then Python process will not be used. cc JoshRosen pwendell Author: Davies Liu <davies@databricks.com> Closes #4601 from davies/freeze and squashes the following commits: e15a8c3 [Davies Liu] update logging 890329c [Davies Liu] Merge branch 'freeze' of github.com:davies/spark into freeze 2bd2228 [Davies Liu] add more logging 656d544 [Davies Liu] Update PythonRDD.scala 05e1085 [Davies Liu] check ending mark in non-block way	2015-02-16 20:32:03 -08:00
Josh Rosen	58a82a7882	[SPARK-5849] Handle more types of invalid JSON requests in SubmitRestProtocolMessage.parseAction This patch improves SubmitRestProtocol's handling of invalid JSON requests in cases where those requests were parsable as JSON but not as JSON objects (e.g. they could be parsed as ararys or strings). I replaced an unchecked cast with pattern-matching and added a new test case. Author: Josh Rosen <joshrosen@databricks.com> Closes #4637 from JoshRosen/rest-protocol-cast and squashes the following commits: b3f282b [Josh Rosen] [SPARK-5849] Handle more types of invalid JSON in SubmitRestProtocolMessage.parseAction	2015-02-16 18:08:02 -08:00
Davies Liu	b1bd1dd322	[SPARK-5788] [PySpark] capture the exception in python write thread The exception in Python writer thread will shutdown executor. Author: Davies Liu <davies@databricks.com> Closes #4577 from davies/exception and squashes the following commits: eb0ceff [Davies Liu] Update PythonRDD.scala 139b0db [Davies Liu] capture the exception in python write thread	2015-02-16 17:57:14 -08:00
Matt Whelan	1294a6e01a	SPARK-5848: tear down the ConsoleProgressBar timer The timer is a GC root, and failing to terminate it leaks SparkContext instances. Author: Matt Whelan <mwhelan@perka.com> Closes #4635 from MattWhelan/SPARK-5848 and squashes the following commits: 2a1e8a5 [Matt Whelan] SPARK-5848: teardown the ConsoleProgressBar timer	2015-02-17 00:59:49 +00:00
Josh Rosen	0cfda8461f	[SPARK-2313] Use socket to communicate GatewayServer port back to Python driver This patch changes PySpark so that the GatewayServer's port is communicated back to the Python process that launches it over a local socket instead of a pipe. The old pipe-based approach was brittle and could fail if `spark-submit` printed unexpected to stdout. To accomplish this, I wrote a custom `PythonGatewayServer.main()` function to use in place of Py4J's `GatewayServer.main()`. Closes #3424. Author: Josh Rosen <joshrosen@databricks.com> Closes #4603 from JoshRosen/SPARK-2313 and squashes the following commits: 6a7740b [Josh Rosen] Remove EchoOutputThread since it's no longer needed 0db501f [Josh Rosen] Use select() so that we don't block if GatewayServer dies. 9bdb4b6 [Josh Rosen] Handle case where getListeningPort returns -1 3fb7ed1 [Josh Rosen] Remove stdout=PIPE 2458934 [Josh Rosen] Use underscore to mark env var. as private d12c95d [Josh Rosen] Use Logging and Utils.tryOrExit() e5f9730 [Josh Rosen] Wrap everything in a giant try-block 2f70689 [Josh Rosen] Use stdin PIPE to share fate with driver 8bf956e [Josh Rosen] Initial cut at passing Py4J gateway port back to driver via socket	2015-02-16 15:25:11 -08:00
Matt Whelan	bb05982dd2	SPARK-5841: remove DiskBlockManager shutdown hook on stop After a call to stop, the shutdown hook is redundant, and causes a memory leak. Author: Matt Whelan <mwhelan@perka.com> Closes #4627 from MattWhelan/SPARK-5841 and squashes the following commits: d5f5c7f [Matt Whelan] SPARK-5841: remove DiskBlockManager shutdown hook on stop	2015-02-16 22:54:32 +00:00
Josh Rosen	d06d5ee9b3	[SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite This patch fixes two difficult-to-reproduce Jenkins test failures in InputOutputMetricsSuite (SPARK-5227 and SPARK-5679). The problem was that WholeTextFileRecordReaderSuite modifies the `fs.local.block.size` Hadoop configuration and this change was affecting subsequent test suites due to Hadoop's caching of FileSystem instances (see HADOOP-8490 for more details). The fix implemented here is to disable FileSystem caching in WholeTextFileRecordReaderSuite. Author: Josh Rosen <joshrosen@databricks.com> Closes #4599 from JoshRosen/inputoutputsuite-fix and squashes the following commits: 47dc447 [Josh Rosen] [SPARK-5227] [SPARK-5679] Disable FileSystem cache in WholeTextFileRecordReaderSuite	2015-02-13 17:45:31 -08:00
Andrew Or	5d3cc6b3d7	[HOTFIX] Fix build break in MesosSchedulerBackendSuite	2015-02-13 13:10:29 -08:00
Josh Rosen	077eec2d9d	[SPARK-5735] Replace uses of EasyMock with Mockito This patch replaces all uses of EasyMock with Mockito. There are two motivations for this: 1. We should use a single mocking framework in our tests in order to keep things consistent. 2. EasyMock may be responsible for non-deterministic unit test failures due to its Objensis dependency (see SPARK-5626 for more details). Most of these changes are fairly mechanical translations of Mockito code to EasyMock, although I made a small change that strengthens the assertions in one test in KinesisReceiverSuite. Author: Josh Rosen <joshrosen@databricks.com> Closes #4578 from JoshRosen/SPARK-5735-remove-easymock and squashes the following commits: 0ab192b [Josh Rosen] Import sorting plus two minor changes to more closely match old semantics. 977565b [Josh Rosen] Remove EasyMock from build. fae1d8f [Josh Rosen] Remove EasyMock usage in KinesisReceiverSuite. 7cca486 [Josh Rosen] Remove EasyMock usage in MesosSchedulerBackendSuite fc5e94d [Josh Rosen] Remove EasyMock in CacheManagerSuite	2015-02-13 09:55:36 -08:00
Ryan Williams	fc6d3e796a	[SPARK-5783] Better eventlog-parsing error messages Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4573 from ryan-williams/history and squashes the following commits: a8647ec [Ryan Williams] fix test calls to .replay() 98aa3fe [Ryan Williams] include filename in history-parsing error message 8deecf0 [Ryan Williams] add line number to history-parsing error message b668b52 [Ryan Williams] add log info line to history-eventlog parsing	2015-02-13 09:47:26 -08:00
uncleGen	c0ccd25641	[SPARK-5732][CORE]:Add an option to print the spark version in spark script. Naturally, we may need to add an option to print the spark version in spark script. It is pretty common in script tool. ![9](https://cloud.githubusercontent.com/assets/7402327/6183331/cab1b74e-b38e-11e4-9daa-e26e6015cff3.JPG) Author: uncleGen <hustyugm@gmail.com> Author: genmao.ygm <genmao.ygm@alibaba-inc.com> Closes #4522 from uncleGen/master-clean-150211 and squashes the following commits: 9f2127c [genmao.ygm] revert the behavior of "-v" 015ddee [uncleGen] minor changes 463f02c [uncleGen] minor changes	2015-02-13 09:44:13 -08:00
Andrew Or	1d5663e92c	[SPARK-5760][SPARK-5761] Fix standalone rest protocol corner cases + revamp tests The changes are summarized in the commit message. Test or test-related code accounts for 90% of the lines changed. Author: Andrew Or <andrew@databricks.com> Closes #4557 from andrewor14/rest-tests and squashes the following commits: b4dc980 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests b55e40f [Andrew Or] Add test for unknown fields cc96993 [Andrew Or] private[spark] -> private[rest] 578cf45 [Andrew Or] Clean up test code a little d82d971 [Andrew Or] v1 -> serverVersion ea48f65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest-tests 00999a8 [Andrew Or] Revamp tests + fix a few corner cases	2015-02-12 14:47:52 -08:00
Kay Ousterhout	47c73d410a	[SPARK-5762] Fix shuffle write time for sort-based shuffle mateiz was excluding the time to write this final file from the shuffle write time intentional? Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4559 from kayousterhout/SPARK-5762 and squashes the following commits: 5c6f3d9 [Kay Ousterhout] Use foreach 94e4237 [Kay Ousterhout] Removed open time metrics added inadvertently ace156c [Kay Ousterhout] Moved metrics to finally block d773276 [Kay Ousterhout] Use nano time 5a59906 [Kay Ousterhout] [SPARK-5762] Fix shuffle write time for sort-based shuffle	2015-02-12 14:46:37 -08:00
Kay Ousterhout	893d6fd704	[SPARK-5645] Added local read bytes/time to task metrics ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned to you but I'd already done this for my own uses a while ago so thought I could help save you the work of doing it! Hopefully this doesn't duplicate any work you've already done. Here's a screenshot of what the UI looks like: ![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png) Based on a discussion with pwendell, I put the data read remotely in as an additional metric rather than showing it in brackets as you'd suggested, Kostas. The assumption here is that the average user doesn't care about the differentiation between local / remote data, so it's better not to pollute the UI. I also added data about the local read time, which I've found very helpful for debugging, but I didn't put it in the UI because I think it's probably something not a ton of people will need to use. With this change, the total read time and total write time shown in the UI will be equal, fixing a long-term source of user confusion: ![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4510 from kayousterhout/SPARK-5645 and squashes the following commits: 4a0182c [Kay Ousterhout] oops 5f5da1b [Kay Ousterhout] Small style fix 5da04cf [Kay Ousterhout] Addressed more comments from Kostas ba05149 [Kay Ousterhout] Remove parens a9dc685 [Kay Ousterhout] Kostas comment, test fix 33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-5645 347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task metrics	2015-02-12 14:36:27 -08:00
Andrew Rowson	466b1f671b	[SPARK-5655] Don't chmod700 application files if running in YARN [Was previously PR4507] As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user. Author: Andrew Rowson <github@growse.com> Closes #4509 from growse/master and squashes the following commits: 7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container	2015-02-12 18:41:39 +00:00
guliangliang	1ac099e3e0	[SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications The links in pagination of HistroyPage is wrong when showing Incomplete Applications. If "2" is click on the following page "http://history-server:18080/?page=1&showIncomplete=true", it will go to "http://history-server:18080/?page=2" instead of "http://history-server:18080/?page=2&showIncomplete=true". Author: guliangliang <guliangliang@qiyi.com> Closes #4523 from marsishandsome/Spark5733 and squashes the following commits: 9d7b593 [guliangliang] [SPARK-5733] Error Link in Pagination of HistroyPage when showing Incomplete Applications	2015-02-11 15:55:49 +00:00
Andrew Or	b969182659	[SPARK-5729] Potential NPE in standalone REST API If the user specifies a bad REST URL, the server will throw an NPE instead of propagating the error back. This is because the default `ErrorServlet` has the wrong prefix. This is a one line fix. I am will add more comprehensive tests in a separate patch. Author: Andrew Or <andrew@databricks.com> Closes #4518 from andrewor14/rest-npe and squashes the following commits: 16b15bc [Andrew Or] Correct ErrorServlet context prefix	2015-02-10 20:19:14 -08:00
mcheah	1cb3770075	[SPARK-4879] Use driver to coordinate Hadoop output committing for speculative tasks Previously, SparkHadoopWriter always committed its tasks without question. The problem is that when speculation is enabled sometimes this can result in multiple tasks committing their output to the same file. Even though an HDFS-writing task may be re-launched due to speculation, the original task is not killed and may eventually commit as well. This can cause strange race conditions where multiple tasks that commit interfere with each other, with the result being that some partition files are actually lost entirely. For more context on these kinds of scenarios, see SPARK-4879. In Hadoop MapReduce jobs, the application master is a central coordinator that authorizes whether or not any given task can commit. Before a task commits its output, it queries the application master as to whether or not such a commit is safe, and the application master does bookkeeping as tasks are requesting commits. Duplicate tasks that would write to files that were already written to from other tasks are prohibited from committing. This patch emulates that functionality - the crucial missing component was a central arbitrator, which is now a module called the OutputCommitCoordinator. The coordinator lives on the driver and the executors can obtain a reference to this actor and request its permission to commit. As tasks commit and are reported as completed successfully or unsuccessfully by the DAGScheduler, the commit coordinator is informed of the task completion events as well to update its internal state. Future work includes more rigorous unit testing and extra optimizations should this patch cause a performance regression. It is unclear what the overall cost of communicating back to the driver on every hadoop-committing task will be. It's also important for those hitting this issue to backport this onto previous version of Spark because the bug has serious consequences, that is, data is lost. Currently, the OutputCommitCoordinator is only used when `spark.speculation` is true. It can be disabled by setting `spark.hadoop.outputCommitCoordination.enabled=false` in SparkConf. This patch is an updated version of #4155 (by mccheah), which in turn was an updated version of this PR. Closes #4155. Author: mcheah <mcheah@palantir.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #4066 from JoshRosen/SPARK-4879-sparkhadoopwriter-fix and squashes the following commits: 658116b [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix ed783b2 [Josh Rosen] Address Andrew’s feedback. e7be65a [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix 14861ea [Josh Rosen] splitID -> partitionID in a few places ed8b554 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix 48d5c1c [Josh Rosen] Roll back copiesRunning change in TaskSetManager 3969f5f [Josh Rosen] Re-enable guarding of commit coordination with spark.speculation setting. ede7590 [Josh Rosen] Add test to ensure that a job that denies all commits cannot complete successfully. 97da5fe [Josh Rosen] Use actor only for RPC; call methods directly in DAGScheduler. f582574 [Josh Rosen] Some cleanup in OutputCommitCoordinatorSuite a7c0e29 [Josh Rosen] Create fake TaskInfo using dummy fields instead of Mockito. 997b41b [Josh Rosen] Roll back unnecessary DAGSchedulerSingleThreadedProcessLoop refactoring: 459310a [Josh Rosen] Roll back TaskSetManager changes that broke other tests. dd00b7c [Josh Rosen] Move CommitDeniedException to executors package; remove `@DeveloperAPI` annotation. c79df98 [Josh Rosen] Some misc. code style + doc changes: f7d69c5 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix 92e6dc9 [Josh Rosen] Bug fix: use task ID instead of StageID to index into authorizedCommitters. b344bad [Josh Rosen] (Temporarily) re-enable “always coordinate” for testing purposes. 0aec91e [Josh Rosen] Only coordinate when speculation is enabled; add configuration option to bypass new coordination. 594e41a [mcheah] Fixing a scalastyle error 60a47f4 [mcheah] Writing proper unit test for OutputCommitCoordinator and fixing bugs. d63f63f [mcheah] Fixing compiler error 9fe6495 [mcheah] Fixing scalastyle 1df2a91 [mcheah] Throwing exception if SparkHadoopWriter commit denied d431144 [mcheah] Using more concurrency to process OutputCommitCoordinator requests. c334255 [mcheah] Properly handling messages that could be sent after actor shutdown. 8d5a091 [mcheah] Was mistakenly serializing the accumulator in test suite. 9c6a4fa [mcheah] More OutputCommitCoordinator cleanup on stop() 78eb1b5 [mcheah] Better OutputCommitCoordinatorActor stopping; simpler canCommit 83de900 [mcheah] Making the OutputCommitCoordinatorMessage serializable abc7db4 [mcheah] TaskInfo can't be null in DAGSchedulerSuite f135a8e [mcheah] Moving the output commit coordinator from class into method. 1c2b219 [mcheah] Renaming oudated names for test function classes 66a71cd [mcheah] Removing whitespace modifications 6b543ba [mcheah] Removing redundant accumulator in unit test c9decc6 [mcheah] Scalastyle fixes bc80770 [mcheah] Unit tests for OutputCommitCoordinator 6e6f748 [mcheah] [SPARK-4879] Use the Spark driver to authorize Hadoop commits.	2015-02-10 20:12:18 -08:00
Marcelo Vanzin	ed167e70c6	[SPARK-5493] [core] Add option to impersonate user. Hadoop has a feature that allows users to impersonate other users when submitting applications or talking to HDFS, for example. These impersonated users are referred generally as "proxy users". Services such as Oozie or Hive use this feature to run applications as the requesting user. This change makes SparkSubmit accept a new command line option to run the application as a proxy user. It also fixes the plumbing of the user name through the UI (and a couple of other places) to refer to the correct user running the application, which can be different than `sys.props("user.name")` even without proxies (e.g. when using kerberos). Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4405 from vanzin/SPARK-5493 and squashes the following commits: df82427 [Marcelo Vanzin] Clarify the reason for the special exception handling. 05bfc08 [Marcelo Vanzin] Remove unneeded annotation. 4840de9 [Marcelo Vanzin] Review feedback. 8af06ff [Marcelo Vanzin] Fix usage string. 2e4fa8f [Marcelo Vanzin] Merge branch 'master' into SPARK-5493 b6c947d [Marcelo Vanzin] Merge branch 'master' into SPARK-5493 0540d38 [Marcelo Vanzin] [SPARK-5493] [core] Add option to impersonate user.	2015-02-10 17:19:10 -08:00
Sephiroth-Lin	52983d7f4f	[SPARK-5644] [Core]Delete tmp dir when sc is stop When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit, so we need to delete these tmp dirs when sc is stop directly. Author: Sephiroth-Lin <linwzhong@gmail.com> Closes #4412 from Sephiroth-Lin/bug-fix-master-01 and squashes the following commits: fbbc785 [Sephiroth-Lin] using an interpolated string b968e14 [Sephiroth-Lin] using an interpolated string 4edf394 [Sephiroth-Lin] rename the variable and update comment 1339c96 [Sephiroth-Lin] add a member to store the reference of tmp dir b2018a5 [Sephiroth-Lin] check sparkFilesDir before delete f48a3c6 [Sephiroth-Lin] don't check sparkFilesDir, check executorId dd9686e [Sephiroth-Lin] format code b38e0f0 [Sephiroth-Lin] add dir check before delete d7ccc64 [Sephiroth-Lin] Change log level 1d70926 [Sephiroth-Lin] update comment e2a2b1b [Sephiroth-Lin] update comment aeac518 [Sephiroth-Lin] Delete tmp dir when sc is stop c0d5b28 [Sephiroth-Lin] Delete tmp dir when sc is stop	2015-02-10 23:23:35 +00:00
Andrew Or	b640c841fc	[HOTFIX][SPARK-4136] Fix compilation and tests	2015-02-10 11:18:01 -08:00
Sandy Ryza	69bc3bb6cf	SPARK-4136. Under dynamic allocation, cancel outstanding executor requests when no longer needed This takes advantage of the changes made in SPARK-4337 to cancel pending requests to YARN when they are no longer needed. Each time the timer in `ExecutorAllocationManager` strikes, we compute `maxNumNeededExecutors`, the maximum number of executors we could fill with the current load. This is calculated as the total number of running and pending tasks divided by the number of cores per executor. If `maxNumNeededExecutors` is below the total number of running and pending executors, we call `requestTotalExecutors(maxNumNeededExecutors)` to let the cluster manager know that it should cancel any pending requests above this amount. If not, `maxNumNeededExecutors` is just used as a bound in alongside the configured `maxExecutors` to limit the number of new requests. The patch modifies the API exposed by `ExecutorAllocationClient` for requesting additional executors by moving from `requestExecutors` to `requestTotalExecutors`. This makes the communication between the `ExecutorAllocationManager` and the `YarnAllocator` easier to reason about and removes some state that needed to be kept in the `CoarseGrainedSchedulerBackend`. I think an argument can be made that this makes for a less attractive user-facing API in `SparkContext`, but I'm having trouble envisioning situations where a user would want to use either of these APIs. This will likely break some tests, but I wanted to get feedback on the approach before adding tests and polishing. Author: Sandy Ryza <sandy@cloudera.com> Closes #4168 from sryza/sandy-spark-4136 and squashes the following commits: 37ce77d [Sandy Ryza] Warn on negative number cd3b2ff [Sandy Ryza] SPARK-4136	2015-02-10 11:12:06 -08:00
Sean Owen	2d1e916730	SPARK-5239 [CORE] JdbcRDD throws "java.lang.AbstractMethodError: oracle.jdbc.driver.xxxxxx.isClosed()Z" This is a completion of https://github.com/apache/spark/pull/4033 which was withdrawn for some reason. Author: Sean Owen <sowen@cloudera.com> Closes #4470 from srowen/SPARK-5239.2 and squashes the following commits: 2398bde [Sean Owen] Avoid use of JDBC4-only isClosed()	2015-02-10 09:19:01 +00:00
Kay Ousterhout	a2d33d0b01	[SPARK-5701] Only set ShuffleReadMetrics when task has shuffle deps The updateShuffleReadMetrics method in TaskMetrics (called by the executor heartbeater) will currently always add a ShuffleReadMetrics to TaskMetrics (with values set to 0), even when the task didn't read any shuffle data. ShuffleReadMetrics should only be added if the task reads shuffle data. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4488 from kayousterhout/SPARK-5701 and squashes the following commits: 673ed58 [Kay Ousterhout] SPARK-5701: Only set ShuffleReadMetrics when task has shuffle deps	2015-02-09 21:22:09 -08:00
Andrew Or	a95ed52157	[SPARK-5703] AllJobsPage throws empty.max exception If you have a `SparkListenerJobEnd` event without the corresponding `SparkListenerJobStart` event, then `JobProgressListener` will create an empty `JobUIData` with an empty `stageIds` list. However, later in `AllJobsPage` we call `stageIds.max`. If this is empty, it will throw an exception. This crashed my history server. Author: Andrew Or <andrew@databricks.com> Closes #4490 from andrewor14/jobs-page-max and squashes the following commits: 21797d3 [Andrew Or] Check nonEmpty before calling max	2015-02-09 21:18:48 -08:00
Marcelo Vanzin	20a6013106	[SPARK-2996] Implement userClassPathFirst for driver, yarn. Yarn's config option `spark.yarn.user.classpath.first` does not work the same way as `spark.files.userClassPathFirst`; Yarn's version is a lot more dangerous, in that it modifies the system classpath, instead of restricting the changes to the user's class loader. So this change implements the behavior of the latter for Yarn, and deprecates the more dangerous choice. To be able to achieve feature-parity, I also implemented the option for drivers (the existing option only applies to executors). So now there are two options, each controlling whether to apply userClassPathFirst to the driver or executors. The old option was deprecated, and aliased to the new one (`spark.executor.userClassPathFirst`). The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it was also doing some things that ended up causing JVM errors depending on how things were being called. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3233 from vanzin/SPARK-2996 and squashes the following commits: 9cf9cf1 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1499e2 [Marcelo Vanzin] Remove SPARK_HOME propagation. fa7df88 [Marcelo Vanzin] Remove 'test.resource' file, create it dynamically. a8c69f1 [Marcelo Vanzin] Review feedback. cabf962 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a1b8d7e [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3f768e3 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 2ce3c7a [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6d6be [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 70d4044 [Marcelo Vanzin] Fix pyspark/yarn-cluster test. 0fe7777 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 0e6ef19 [Marcelo Vanzin] Move class loaders around and make names more meaninful. fe970a7 [Marcelo Vanzin] Review feedback. 25d4fed [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 3cb6498 [Marcelo Vanzin] Call the right loadClass() method on the parent. fbb8ab5 [Marcelo Vanzin] Add locking in loadClass() to avoid deadlocks. 2e6c4b7 [Marcelo Vanzin] Mention new setting in documentation. b6497f9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 a10f379 [Marcelo Vanzin] Some feedback. 3730151 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 f513871 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 44010b6 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 7b57cba [Marcelo Vanzin] Remove now outdated message. 5304d64 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 35949c8 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 54e1a98 [Marcelo Vanzin] Merge branch 'master' into SPARK-2996 d1273b2 [Marcelo Vanzin] Add test file to rat exclude. fa1aafa [Marcelo Vanzin] Remove write check on user jars. 89d8072 [Marcelo Vanzin] Cleanups. a963ea3 [Marcelo Vanzin] Implement spark.driver.userClassPathFirst for standalone cluster mode. 50afa5f [Marcelo Vanzin] Fix Yarn executor command line. 7d14397 [Marcelo Vanzin] Register user jars in executor up front. 7f8603c [Marcelo Vanzin] Fix yarn-cluster mode without userClassPathFirst. 20373f5 [Marcelo Vanzin] Fix ClientBaseSuite. 55c88fa [Marcelo Vanzin] Run all Yarn integration tests via spark-submit. 0b64d92 [Marcelo Vanzin] Add deprecation warning to yarn option. 4a84d87 [Marcelo Vanzin] Fix the child-first class loader. d0394b8 [Marcelo Vanzin] Add "deprecated configs" to SparkConf. 46d8cf2 [Marcelo Vanzin] Update doc with new option, change name to "userClassPathFirst". a314f2d [Marcelo Vanzin] Enable driver class path isolation in SparkSubmit. 91f7e54 [Marcelo Vanzin] [yarn] Enable executor class path isolation. a853e74 [Marcelo Vanzin] Re-work CoarseGrainedExecutorBackend command line arguments. 89522ef [Marcelo Vanzin] Add class path isolation support for Yarn cluster mode.	2015-02-09 21:17:28 -08:00
KaiXinXiaoLei	31d435ecfd	Add a config option to print DAG. Add a config option "spark.rddDebug.enable" to check whether to print DAG info. When "spark.rddDebug.enable" is true, it will print information about DAG in the log. Author: KaiXinXiaoLei <huleilei1@huawei.com> Closes #4257 from KaiXinXiaoLei/DAGprint and squashes the following commits: d9fe42e [KaiXinXiaoLei] change log info c27ee76 [KaiXinXiaoLei] change log info 83c2b32 [KaiXinXiaoLei] change config option adcb14f [KaiXinXiaoLei] change the file. f4e7b9e [KaiXinXiaoLei] add a option to print DAG	2015-02-09 20:58:58 -08:00
Andrew Or	d302c4800b	[SPARK-5698] Do not let user request negative # of executors Otherwise we might crash the ApplicationMaster. Why? Please see https://issues.apache.org/jira/browse/SPARK-5698. sryza I believe this is also relevant in your patch #4168. Author: Andrew Or <andrew@databricks.com> Closes #4483 from andrewor14/da-negative and squashes the following commits: 53ed955 [Andrew Or] Throw IllegalArgumentException instead 0e89fd5 [Andrew Or] Check against negative requests	2015-02-09 17:33:29 -08:00
mcheah	6fe70d8432	[SPARK-5691] Fixing wrong data structure lookup for dupe app registratio... In Master's registerApplication method, it checks if the application had already registered by examining the addressToWorker hash map. In reality, it should refer to the addressToApp data structure, as this is what really tracks which apps have been registered. Author: mcheah <mcheah@palantir.com> Closes #4477 from mccheah/spark-5691 and squashes the following commits: efdc573 [mcheah] [SPARK-5691] Fixing wrong data structure lookup for dupe app registration	2015-02-09 13:20:14 -08:00
Kirill A. Korinskiy	23a99dabf1	[SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args Spark web UI return `HTTP ERROR 500` when GET arguments is missing. Author: Kirill A. Korinskiy <catap@catap.ru> Closes #4239 from catap/ui_500 and squashes the following commits: 520e180 [Kirill A. Korinskiy] [SPARK-5672][Web UI] Return `HTTP ERROR 400` when have missing args	2015-02-08 10:31:46 +00:00
Jacek Lewandowski	dd4cb33a27	SPARK-5408: Use -XX:MaxPermSize specified by user instead of default in ... ...ExecutorRunner and DriverRunner Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4203 from jacek-lewandowski/SPARK-5408-1.3 and squashes the following commits: d913686 [Jacek Lewandowski] SPARK-5408: Use -XX:MaxPermSize specified by used instead of default in ExecutorRunner and DriverRunner	2015-02-07 15:58:04 +00:00
Andrew Or	1390e56fa8	[SPARK-5388] Provide a stable application submission gateway for standalone cluster mode The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there. As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066). By the way this includes ~700 lines of tests and ~200 lines of license. Author: Andrew Or <andrew@databricks.com> Closes #4216 from andrewor14/rest and squashes the following commits: 8d7ce07 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 6f0c597 [Andrew Or] Use nullable fields for integer and boolean values dfe4bd7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest b9e2a08 [Andrew Or] Minor comments 02b5cea [Andrew Or] Fix tests d2b1ef8 [Andrew Or] Comment changes + minor code refactoring across the board 9c82a36 [Andrew Or] Minor comment and wording updates b4695e7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest c9a8ad7 [Andrew Or] Do not include appResource and mainClass as properties 6fc7670 [Andrew Or] Report REST server response back to the user 40e6095 [Andrew Or] Pass submit parameters through system properties cbd670b [Andrew Or] Include unknown fields, if any, in server response 9fee16f [Andrew Or] Include server protocol version on mismatch 09f873a [Andrew Or] Fix style 8188e61 [Andrew Or] Upgrade Jackson from 2.3.0 to 2.4.4 37538e0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 9165ae8 [Andrew Or] Fall back to Akka if endpoint was not REST 252d53c [Andrew Or] Clean up server error handling behavior further c643f64 [Andrew Or] Fix style bbbd329 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 792e112 [Andrew Or] Use specific HTTP response codes on error f98660b [Andrew Or] Version the protocol and include it in REST URL 721819f [Andrew Or] Provide more REST-like interface for submit/kill/status 581f7bf [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 9e0d1af [Andrew Or] Move some classes around to reduce number of files (minor) 42e5de4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 1f1c03f [Andrew Or] Use Jackson's DefaultScalaModule to simplify messages 9229433 [Andrew Or] Reduce duplicate naming in REST field ade28fd [Andrew Or] Clean up REST response output in Spark submit b2fef8b [Andrew Or] Abstract the success field to the general response 6c57b4b [Andrew Or] Increase timeout in end-to-end tests bf696ff [Andrew Or] Add checks for enabling REST when using kill/status 7ee6737 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e2f7f5f [Andrew Or] Provide more safeguard against missing fields 9581df7 [Andrew Or] Clean up uses of exceptions 914fdff [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e2104e6 [Andrew Or] stable -> rest 3db7379 [Andrew Or] Fix comments and name fields for better error messages 8d43486 [Andrew Or] Replace SubmitRestProtocolAction with class name df90e8b [Andrew Or] Use Jackson for JSON de/serialization d7a1f9f [Andrew Or] Fix local cluster tests efa5e18 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e42c131 [Andrew Or] Add end-to-end tests for standalone REST protocol 837475b [Andrew Or] Show the REST port on the Master UI d8d3717 [Andrew Or] Use a daemon thread pool for REST server 6568ca5 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 77774ba [Andrew Or] Minor fixes 206cae4 [Andrew Or] Refactor and add tests for the REST protocol 63c05b3 [Andrew Or] Remove MASTER as a field (minor) 9e21b72 [Andrew Or] Action -> SparkSubmitAction (minor) 51c5ca6 [Andrew Or] Distinguish client and server side Spark versions b44e103 [Andrew Or] Implement status requests + fix validation behavior 120ab9d [Andrew Or] Support kill and request driver status through SparkSubmit 544de1d [Andrew Or] Major clean ups in code and comments e958cae [Andrew Or] Supported nested values in messages 484bd21 [Andrew Or] Specify an ordering for fields in SubmitDriverRequestMessage 6ff088d [Andrew Or] Rename classes to generalize REST protocol af9d9cb [Andrew Or] Integrate REST protocol in standalone mode 53e7c0e [Andrew Or] Initial client, server, and all the messages	2015-02-06 15:57:06 -08:00
Kostas Sakellis	dcd1e42d6b	[SPARK-4874] [CORE] Collect record count metrics Collects record counts for both Input/Output and Shuffle Metrics. For the input/output metrics, it just appends the counter every time the iterators get accessed. For shuffle on the write side, we count the metrics post aggregation (after a map side combine) and on the read side we count the metrics pre aggregation. This allows both the bytes read/written metrics and the records read/written to line up. For backwards compatibility, if we deserialize an older event that doesn't have record metrics, we set the metric to -1. Author: Kostas Sakellis <kostas@cloudera.com> Closes #4067 from ksakellis/kostas-spark-4874 and squashes the following commits: bd919be [Kostas Sakellis] Changed 'Records Read' in shuffleReadMetrics json output to 'Total Records Read' dad4d57 [Kostas Sakellis] Add a comment and check to BlockObjectWriter so that it cannot be reopend. 6f236a1 [Kostas Sakellis] Renamed _recordsWritten in ShuffleWriteMetrics to be more consistent 70620a0 [Kostas Sakellis] CR Feedback 17faa3a [Kostas Sakellis] Removed AtomicLong in favour of using Long b6f9923 [Kostas Sakellis] Merge AfterNextInterceptingIterator with InterruptableIterator to save a function call 46c8186 [Kostas Sakellis] Combined Bytes and # records into one column 57551c1 [Kostas Sakellis] Conforms to SPARK-3288 6cdb44e [Kostas Sakellis] Removed the generic InterceptingIterator and repalced it with specific implementation 1aa273c [Kostas Sakellis] CR Feedback 1bb78b1 [Kostas Sakellis] [SPARK-4874] [CORE] Collect record count metrics	2015-02-06 14:31:20 -08:00
Marcelo Vanzin	5687bab8fd	[SPARK-5600] [core] Clean up FsHistoryProvider test, fix app sort order. Clean up some test setup code to remove duplicate instantiation of the provider. Also make sure unfinished apps are sorted correctly. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4370 from vanzin/SPARK-5600 and squashes the following commits: 0d048d5 [Marcelo Vanzin] Cleanup test code a bit. 2585119 [Marcelo Vanzin] Review feedback. 8b97544 [Marcelo Vanzin] Merge branch 'master' into SPARK-5600 be979e9 [Marcelo Vanzin] Merge branch 'master' into SPARK-5600 298371c [Marcelo Vanzin] [SPARK-5600] [core] Clean up FsHistoryProvider test, fix app sort order.	2015-02-06 14:23:09 -08:00
zsxwing	af2a2a263a	[SPARK-4361][Doc] Add more docs for Hadoop Configuration I'm trying to point out reusing a Configuration in these APIs is dangerous. Any better idea? Author: zsxwing <zsxwing@gmail.com> Closes #3225 from zsxwing/SPARK-4361 and squashes the following commits: fe4e3d5 [zsxwing] Add more docs for Hadoop Configuration	2015-02-06 11:51:09 -08:00
Josh Rosen	fb6c0cbac4	[HOTFIX] Fix test build break in ExecutorAllocationManagerSuite. This was caused because #3486 added a new field to ExecutorInfo and #4369 added new tests that created ExecutorInfos. These patches were merged in quick succession and were never tested together, hence the compilation error.	2015-02-06 11:48:52 -08:00
Josh Rosen	0d74bd7fd7	[SPARK-5555] Enable UISeleniumSuite tests This patch enables UISeleniumSuite, a set of tests for the Spark application web UI. These tests were previously disabled because they were slow, but I think we now have sufficient test time budget that the benefit of enabling them outweighs the time costs. Author: Josh Rosen <joshrosen@databricks.com> Closes #4334 from JoshRosen/enable-uiseleniumsuite and squashes the following commits: 4ab9477 [Josh Rosen] Use BeforeAndAfterAll to cleanup WebDriver 71efc72 [Josh Rosen] Update broken UISeleniumSuite tests; use random port #. a5ab595 [Josh Rosen] Enable UISeleniumSuite tests.	2015-02-06 11:14:58 -08:00
Kostas Sakellis	32e964c410	SPARK-2450 Adds executor log links to Web UI Adds links to stderr/stdout in the executor tab of the webUI for: 1) Standalone 2) Yarn client 3) Yarn cluster This tries to add the log url support in a general way so as to make it easy to add support for all the cluster managers. This is done by using environment variables to pass to the executor the log urls. The SPARK_LOG_URL_ prefix is used and so additional logs besides stderr/stdout can also be added. To propagate this information to the UI we use the onExecutorAdded spark listener event. Although this commit doesn't add log urls when running on a mesos cluster, it should be possible to add using the same mechanism. Author: Kostas Sakellis <kostas@cloudera.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #3486 from ksakellis/kostas-spark-2450 and squashes the following commits: d190936 [Josh Rosen] Fix a few minor style / formatting nits. Reset listener after each test Don't null listener out at end of main(). 8673fe1 [Kostas Sakellis] CR feedback. Hide the log column if there are no logs available 5bf6952 [Kostas Sakellis] [SPARK-2450] [CORE] Adds exeuctor log links to Web UI	2015-02-06 11:13:00 -08:00
Makoto Fukuhara	4cdb26c174	[SPARK-5618][Spark Core][Minor] Optimise utility code. Author: Makoto Fukuhara <fukuo33@gmail.com> Closes #4396 from fukuo33/fix-unnecessary-regex and squashes the following commits: cd07fd6 [Makoto Fukuhara] fix unnecessary regex.	2015-02-06 11:12:34 -08:00
lianhuiwang	6072fcc14e	[SPARK-5593][Core]Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener More strictly, in ExecutorAllocationListener, we need to replace onBlockManagerAdded, onBlockManagerRemoved with onExecutorAdded,onExecutorRemoved. because at some time, onExecutorAdded and onExecutorRemoved are more accurate to express these meanings. example at SPARK-5529, BlockManager has been removed,but executor is existed. andrewor14 sryza Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #4369 from lianhuiwang/SPARK-5593 and squashes the following commits: 333367c [lianhuiwang] Replace BlockManagerListener with ExecutorListener in ExecutorAllocationListener	2015-02-06 11:09:37 -08:00
Stephen Haberman	9792bec596	[SPARK-4877] Allow user first classes to extend classes in the parent. Previously, the classloader isolation was almost too good, such that if a child class needed to load/reference a class that was only available in the parent, it could not do so. This adds tests for that case, the user-first Fake2 class extends the only-in-parent Fake3 class. It also sneaks in a fix where only the first stage seemed to work, and on subsequent stages, a LinkageError happened because classes from the user-first classpath were getting defined twice. Author: Stephen Haberman <stephen@exigencecorp.com> Closes #3725 from stephenh/4877_user_first_parent_inheritance and squashes the following commits: dabcd35 [Stephen Haberman] [SPARK-4877] Respect userClassPathFirst for the driver code too. 3d0fa7c [Stephen Haberman] [SPARK-4877] Allow user first classes to extend classes in the parent.	2015-02-06 11:03:56 -08:00
Andrew Or	fe3740c4c8	[SPARK-5636] Ramp up faster in dynamic allocation A recent patch #4051 made the initial number default to 0. With this change, any Spark application using dynamic allocation's default settings will ramp up very slowly. Since we never request more executors than needed to saturate the pending tasks, it is safe to ramp up quickly. The current default of 60 may be too slow. Author: Andrew Or <andrew@databricks.com> Closes #4409 from andrewor14/dynamic-allocation-interval and squashes the following commits: d3cc485 [Andrew Or] Lower request interval	2015-02-06 10:55:13 -08:00
Ryan Williams	37d35ab53b	[SPARK-5416] init Executor.threadPool before ExecutorSource Some ExecutorSource metrics can NPE by attempting to reference the threadpool otherwise. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4212 from ryan-williams/threadpool and squashes the following commits: 236f2ad [Ryan Williams] init Executor.threadPool before ExecutorSource	2015-02-06 12:22:25 +00:00
lianhuiwang	ed3aac7912	[SPARK-5470][Core]use defaultClassLoader to load classes in KryoSerializer Now KryoSerializer load classes of classesToRegister at the time of its initialization. when we set spark.kryo.classesToRegister=class1, it will throw SparkException("Failed to load class to register with Kryo". because in KryoSerializer's initialization, classLoader cannot include class of user's jars. we need to use defaultClassLoader of Serializer in newKryo(), because executor will reset defaultClassLoader of Serializer after Serializer's initialization. thank zzcclp for reporting it to me. Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #4258 from lianhuiwang/SPARK-5470 and squashes the following commits: 73b719f [lianhuiwang] do the splitting and filtering during initialization 64cf306 [lianhuiwang] use defaultClassLoader to load classes of classesToRegister in KryoSerializer	2015-02-06 11:00:35 +00:00
Marcelo Vanzin	856928979f	[SPARK-5582] [history] Ignore empty log directories. Empty log directories are not useful at the moment, but if one ends up showing in the log root, it breaks the code that checks for log directories. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4352 from vanzin/SPARK-5582 and squashes the following commits: 1a6a3d4 [Marcelo Vanzin] [SPARK-5582] Fix exception when looking at empty directories.	2015-02-06 10:07:20 +00:00
Patrick Wendell	793dbaef40	SPARK-5557: Explicitly include servlet API in dependencies. Because of the way we shade jetty, we lose its dependency orbit in the assembly jar, which includes the javax servlet API's. This adds back orbit explicitly, using the version that matches our jetty version. Author: Patrick Wendell <patrick@databricks.com> Closes #4411 from pwendell/servlet-api and squashes the following commits: 445f868 [Patrick Wendell] SPARK-5557: Explicitly include servlet API in dependencies.	2015-02-05 18:14:54 -08:00
Cheng Lian	a9ed51178c	[SPARK-5182] [SPARK-5528] [SPARK-5509] [SPARK-3575] [SQL] Parquet data source improvements This PR adds three major improvements to Parquet data source: 1. Partition discovery While reading Parquet files resides in Hive style partition directories, `ParquetRelation2` automatically discovers partitioning information and infers partition column types. This is also a partial work for [SPARK-5182] [1], which aims to provide first class partitioning support for the data source API. Related code in this PR can be easily extracted to the data source API level in future versions. 1. Schema merging When enabled, Parquet data source collects schema information from all Parquet part-files and tries to merge them. Exceptions are thrown when incompatible schemas are detected. This feature is controlled by data source option `parquet.mergeSchema`, and is enabled by default. 1. Metastore Parquet table conversion moved to analysis phase This greatly simplifies the conversion logic. `ParquetConversion` strategy can be removed once the old Parquet implementation is removed in the future. This version of Parquet data source aims to entirely replace the old Parquet implementation. However, the old version hasn't been removed yet. Users can fall back to the old version by turning off SQL configuration `spark.sql.parquet.useDataSourceApi`. Other JIRA tickets fixed as side effects in this PR: - [SPARK-5509] [3]: `EqualTo` now uses a proper `Ordering` to compare binary types. - [SPARK-3575] [4]: Metastore schema is now preserved and passed to `ParquetRelation2` via data source option `parquet.metastoreSchema`. TODO: - [ ] More test cases for partition discovery - [x] Fix write path after data source write support (#4294) is merged It turned out to be non-trivial to fall back to old Parquet implementation on the write path when Parquet data source is enabled. Since we're planning to include data source write support in 1.3.0, I simply ignored two test cases involving Parquet insertion for now. - [ ] Fix outdated comments and documentations PS: This PR looks big, but more than a half of the changed lines in this PR are trivial changes to test cases. To test Parquet with and without the new data source, almost all Parquet test cases are moved into wrapper driver functions. This introduces hundreds of lines of changes. [1]: https://issues.apache.org/jira/browse/SPARK-5182 [2]: https://issues.apache.org/jira/browse/SPARK-5528 [3]: https://issues.apache.org/jira/browse/SPARK-5509 [4]: https://issues.apache.org/jira/browse/SPARK-3575 <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4308) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #4308 from liancheng/parquet-partition-discovery and squashes the following commits: b6946e6 [Cheng Lian] Fixes MiMA issues, addresses comments 8232e17 [Cheng Lian] Write support for Parquet data source a49bd28 [Cheng Lian] Fixes spelling typo in trait name "CreateableRelationProvider" 808380f [Cheng Lian] Fixes issues introduced while rebasing 50dd8d1 [Cheng Lian] Addresses @rxin's comment, fixes UDT schema merging adf2aae [Cheng Lian] Fixes compilation error introduced while rebasing 4e0175f [Cheng Lian] Fixes Python Parquet API, we need Py4J array to call varargs method 0d8ec1d [Cheng Lian] Adds more test cases b35c8c6 [Cheng Lian] Fixes some typos and outdated comments dd704fd [Cheng Lian] Fixes Python Parquet API 596c312 [Cheng Lian] Uses switch to control whether use Parquet data source or not 7d0f7a2 [Cheng Lian] Fixes Metastore Parquet table conversion a1896c7 [Cheng Lian] Fixes all existing Parquet test suites except for ParquetMetastoreSuite 5654c9d [Cheng Lian] Draft version of Parquet partition discovery and schema merging	2015-02-05 15:29:56 -08:00
Jacek Lewandowski	081ac69f3d	SPARK-5548: Fixed a race condition in AkkaUtilsSuite `Await.result` and `selection.resolveOne` runs the same timeout simultaneously. When `Await.result` timeout is reached first, then `TimeoutException` is thrown. On the other hand, when `selection.resolveOne` timeout is reached first, `ActorNotFoundException` is thrown. This is an obvious race condition and the easiest way to fix it is to increase the timeout of one method to make sure the code fails on the other method first. Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4343 from jacek-lewandowski/SPARK-5548-1.3 and squashes the following commits: b9ba47e [Jacek Lewandowski] SPARK-5548: Fixed a race condition in AkkaUtilsSuite	2015-02-05 12:21:40 -08:00
Sandy Ryza	c4b1108c3f	SPARK-4687. Add a recursive option to the addFile API This adds a recursive option to the addFile API to satisfy Hive's needs. It only allows specifying HDFS dirs that will be copied down on every executor. There are a couple outstanding questions. * Should we allow specifying local dirs as well? The best way to do this would probably be to archive them. The drawback is that it would require a fair bit of code that I don't know of any current use cases for. * The addFiles implementation has a caching component that I don't entirely understand. What events are we caching between? AFAICT it's users calling addFile on the same file in the same app at different times? Do we want/need to add something similar for addDirectory. * The addFiles implementation will check to see if an added file already exists and has the same contents. I imagine we want the same behavior, so planning to add this unless people think otherwise. I plan to add some tests if people are OK with the approach. Author: Sandy Ryza <sandy@cloudera.com> Closes #3670 from sryza/sandy-spark-4687 and squashes the following commits: f9fc77f [Sandy Ryza] Josh's comments 70cd24d [Sandy Ryza] Add another test 13da824 [Sandy Ryza] Revert executor changes 38bf94d [Sandy Ryza] Marcelo's comments ca83849 [Sandy Ryza] Add addFile test 1941be3 [Sandy Ryza] Fix test and avoid HTTP server in local mode 31f15a9 [Sandy Ryza] Use cache recursively and fix some compile errors 0239c3d [Sandy Ryza] Change addDirectory to addFile with recursive 46fe70a [Sandy Ryza] SPARK-4687. Add a addDirectory API	2015-02-05 10:15:55 -08:00
Josh Rosen	9a7ce70eab	[SPARK-5411] Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext This patch introduces a new configuration option, `spark.extraListeners`, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized. From the configuration documentation: > A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception. This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code. Author: Josh Rosen <joshrosen@databricks.com> Closes #4111 from JoshRosen/SPARK-5190-register-sparklistener-in-sc-constructor and squashes the following commits: 8370839 [Josh Rosen] Two minor fixes after merging with master 6e0122c [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5190-register-sparklistener-in-sc-constructor 1a5b9a0 [Josh Rosen] Remove SPARK_EXTRA_LISTENERS environment variable. 2daff9b [Josh Rosen] Add a couple of explanatory comments for SPARK_EXTRA_LISTENERS. b9973da [Josh Rosen] Add test to ensure that conf and env var settings are merged, not overriden. d6f3113 [Josh Rosen] Use getConstructors() instead of try-catch to find right constructor. d0d276d [Josh Rosen] Move code into setupAndStartListenerBus() method b22b379 [Josh Rosen] Instantiate SparkListeners from classes listed in configurations. 9c0d8f1 [Josh Rosen] Revert "[SPARK-5190] Allow SparkListeners to be registered before SparkContext starts." 217ecc0 [Josh Rosen] Revert "Add addSparkListener to JavaSparkContext" 25988f3 [Josh Rosen] Add addSparkListener to JavaSparkContext 163ba19 [Josh Rosen] [SPARK-5190] Allow SparkListeners to be registered before SparkContext starts.	2015-02-04 17:18:03 -08:00
Davies Liu	0a89b15685	[SPARK-4939] move to next locality when no pending tasks Currently, if there are different locality in a task set, the tasks with NODE_LOCAL only get scheduled after all the PROCESS_LOCAL tasks are scheduled and timeout with spark.locality.wait.process (3 seconds by default). In local mode, the LocalScheduler will never call resourceOffer() again once it failed to get a task with same locality, then all the NODE_LOCAL tasks will be never scheduled. This bug could be reproduced by run example python/streaming/stateful_network_wordcount.py, it will hang after finished a batch with some data. This patch will check whether there is task for current locality level, if not, it will change to next locality level without waiting for `spark.locality.wait.process` seconds. It works for all locality levels. Because the list of pending tasks are updated lazily, the check can be false-positive, it means it will not move to next locality level even there is no valid pending tasks, it will wait for timeout. Author: Davies Liu <davies@databricks.com> Closes #3779 from davies/local_streaming and squashes the following commits: 2d25fb3 [Davies Liu] Update TaskSetManager.scala 1550668 [Davies Liu] add comment 1c37aac [Davies Liu] address comments 6b13824 [Davies Liu] address comments 906f456 [Davies Liu] Merge branch 'master' of github.com:apache/spark into local_streaming 414e79e [Davies Liu] fix bug, add logging ff8eabb [Davies Liu] Merge branch 'master' into local_streaming 28d1b3c [Davies Liu] check tasks 9d0ceab [Davies Liu] Merge branch 'master' of github.com:apache/spark into local_streaming 37a2804 [Davies Liu] fix tests 49bda82 [Davies Liu] address comment d8fb95a [Davies Liu] move to next locality level if no more tasks 2d6ae73 [Davies Liu] add comments 32d363f [Davies Liu] add regression test 7d8c5a5 [Davies Liu] jump to next locality if no pending tasks for executors	2015-02-04 14:22:07 -08:00
Davies Liu	ac0b2b788f	[SPARK-5588] [SQL] support select/filter by SQL expression ``` df.selectExpr('a + 1', 'abs(age)') df.filter('age > 3') df[ df.age > 3 ] df[ ['age', 'name'] ] ``` Author: Davies Liu <davies@databricks.com> Closes #4359 from davies/select_expr and squashes the following commits: d99856b [Davies Liu] support select/filter by SQL expression	2015-02-04 11:34:46 -08:00
Imran Rashid	5aa0f219d1	[SPARK-5574] use given name prefix in dir https://issues.apache.org/jira/browse/SPARK-5574 very minor, doesn't effect external behavior at all. Note that after this change, some of these dirs no longer will have "spark" in the name at all. I could change those locations that do pass in a name prefix to also include "spark", eg. "blockmgr" -> "spark-blockmgr" Author: Imran Rashid <irashid@cloudera.com> Closes #4344 from squito/SPARK-5574 and squashes the following commits: 33a84fe [Imran Rashid] use given name prefix in dir	2015-02-04 01:02:20 -08:00
Burak Yavuz	6aed719e50	[SPARK-5341] Use maven coordinates as dependencies in spark-shell and spark-submit This PR adds support for using maven coordinates as dependencies to spark-shell. Coordinates can be provided as a comma-delimited string after the flag `--packages`. Additional remote repositories (like SonaType) can be supplied as a comma-delimited string after the flag `--repositories`. Uses the Ivy library to resolve dependencies. Unfortunately the library has no decent documentation, therefore solving more complex dependency issues can be a problem. pwendell, mateiz, mengxr Note: This is still a WIP. The following need to be handled: - [x] add docs for the methods - [x] take local ivy cache path as an argument - [x] add tests - [x] add Windows compatibility - [x] exclude unused Ivy dependencies Author: Burak Yavuz <brkyvz@gmail.com> Closes #4215 from brkyvz/SPARK-5341ivy and squashes the following commits: 9215851 [Burak Yavuz] ready to merge db2a5cc [Burak Yavuz] changed logging to printStream 9dae87f [Burak Yavuz] file separators changed 71c374d [Burak Yavuz] merge conflicts fixed c08dc9f [Burak Yavuz] fixed merge conflicts 3ada19a [Burak Yavuz] fixed Jenkins error (hopefully) and added comment on oro 43c2290 [Burak Yavuz] fixed that ONE line 231f72f [Burak Yavuz] addressed code review 2cd6562 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5341ivy 85ec5a3 [Burak Yavuz] added oro as a dependency explicitly ea44ca4 [Burak Yavuz] add oro back to dependencies cef0e24 [Burak Yavuz] IntelliJ is just messing things up 97c4a92 [Burak Yavuz] fix more weird IntelliJ formatting 9cf077d [Burak Yavuz] fix weird IntelliJ formatting dcf5e13 [Burak Yavuz] fix windows command line flags 3a23f21 [Burak Yavuz] excluded ivy dependencies `53423e0` [Burak Yavuz] tests added 3705907 [Burak Yavuz] remove ivy-repo as a command line argument. Use global ivy cache as default c04d885 [Burak Yavuz] take path to ivy cache as a conf 2edc9b5 [Burak Yavuz] managed to exclude Spark and it's dependencies a0870af [Burak Yavuz] add docs. remove unnecesary new lines 6645af4 [Burak Yavuz] [SPARK-5341] added base implementation 882c4c8 [Burak Yavuz] added maven dependency download	2015-02-03 22:39:17 -08:00
Davies Liu	83de71c45b	[SPARK-4939] revive offers periodically in LocalBackend The locality timeout assume that the SchedulerBackend can revive offers periodically, but currently LocalBackend did do that, then some job with mixed locality levels in local mode will hang forever. This PR let LocalBackend revive offers periodically, just like in cluster mode. Author: Davies Liu <davies@databricks.com> Closes #4147 from davies/revive and squashes the following commits: 2acdf9d [Davies Liu] Update LocalBackend.scala 3c8ca7c [Davies Liu] Update LocalBackend.scala d1b60d2 [Davies Liu] address comments from Kay 33ac9bb [Davies Liu] fix build d0da0d5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into revive 6cf5972 [Davies Liu] fix thread-safety ed62a31 [Davies Liu] fix scala style df9008b [Davies Liu] fix typo bfc1396 [Davies Liu] revive offers periodically in LocalBackend	2015-02-03 22:30:23 -08:00
freeman	242b4f02df	[SPARK-4969][STREAMING][PYTHON] Add binaryRecords to streaming In Spark 1.2 we added a `binaryRecords` input method for loading flat binary data. This format is useful for numerical array data, e.g. in scientific computing applications. This PR adds support for the same format in Streaming applications, where it is similarly useful, especially for streaming time series or sensor data. Summary of additions - adding `binaryRecordsStream` to Spark Streaming - exposing `binaryRecordsStream` in the new PySpark Streaming - new unit tests in Scala and Python This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward. tdas davies Author: freeman <the.freeman.lab@gmail.com> Closes #3803 from freeman-lab/streaming-binary-records and squashes the following commits: b676534 [freeman] Clarify note 5ff1b75 [freeman] Add note to java streaming context eba925c [freeman] Simplify notes c4237b8 [freeman] Add experimental tag 30eba67 [freeman] Add filter and newFilesOnly alongside conf c2cfa6d [freeman] Expose new version of fileStream with conf in java 34d20ef [freeman] Add experimental tag 14bca9a [freeman] Add experimental tag b85bffc [freeman] Formatting 47560f4 [freeman] Space formatting 9a3715a [freeman] Refactor to reflect changes to FileInputSuite 7373f73 [freeman] Add note and defensive assertion for byte length 3ceb684 [freeman] Merge remote-tracking branch 'upstream/master' into streaming-binary-records 317b6d1 [freeman] Make test inline fcb915c [freeman] Formatting becb344 [freeman] Formatting d3e75b2 [freeman] Add tests in python a4324a3 [freeman] Line length 029d49c [freeman] Formatting 1c739aa [freeman] Simpler default arg handling 94d90d0 [freeman] Spelling 2843e9d [freeman] Add params to docstring 8b70fbc [freeman] Reorganization 28bff9b [freeman] Fix missing arg 9398bcb [freeman] Expose optional hadoop configuration 23dd69f [freeman] Tests for binaryRecordsStream 36cb0fd [freeman] Add binaryRecordsStream to scala fe4e803 [freeman] Add binaryRecordStream to Java API ecef0eb [freeman] Add binaryRecordsStream to python 8550c26 [freeman] Expose additional argument combination	2015-02-03 22:24:30 -08:00
zsxwing	d37978d8aa	[SPARK-4795][Core] Redesign the "primitive type => Writable" implicit APIs to make them be activated automatically Try to redesign the "primitive type => Writable" implicit APIs to make them be activated automatically and without breaking binary compatibility. However, this PR will breaking the source compatibility if people use `xxxToXxxWritable` occasionally. See the unit test in `graphx`. Author: zsxwing <zsxwing@gmail.com> Closes #3642 from zsxwing/SPARK-4795 and squashes the following commits: 914b2d6 [zsxwing] Add implicit back to the Writables methods 0b9017f [zsxwing] Add some docs a0e8509 [zsxwing] Merge branch 'master' into SPARK-4795 39343de [zsxwing] Fix the unit test 64853af [zsxwing] Reorganize the rest 'implicit' methods in SparkContext	2015-02-03 20:17:12 -08:00
Reynold Xin	f7948f3f57	Minor: Fix TaskContext deprecated annotations. Made a mistake in https://github.com/apache/spark/pull/4324 Author: Reynold Xin <rxin@databricks.com> Closes #4333 from rxin/taskcontext-deprecate and squashes the following commits: 61c44ee [Reynold Xin] Minor: Fix TaskContext deprecated annotations.	2015-02-03 10:34:16 -08:00
Reynold Xin	bebf4c42be	[SPARK-5549] Define TaskContext interface in Scala. So the interface documentation shows up in ScalaDoc. Author: Reynold Xin <rxin@databricks.com> Closes #4324 from rxin/TaskContext-scala and squashes the following commits: 2480a17 [Reynold Xin] comment 573756f [Reynold Xin] style fixes and javadoc fixes. 87dd537 [Reynold Xin] [SPARK-5549] Define TaskContext interface in Scala.	2015-02-03 00:46:04 -08:00
Josh Rosen	b8ebebeaaa	[SPARK-5414] Add SparkFirehoseListener class for consuming all SparkListener events There isn't a good way to write a SparkListener that receives all SparkListener events and which will be future-compatible (e.g. it will receive events introduced in newer versions of Spark without having to override new methods to process those events). To address this, this patch adds `SparkFirehoseListener`, a SparkListener implementation that receives all events and dispatches them to a single `onEvent` method (which can be overridden by users). Author: Josh Rosen <joshrosen@databricks.com> Closes #4210 from JoshRosen/firehose-listener and squashes the following commits: 223f579 [Josh Rosen] Expand comment to explain rationale for this being a Java class. ecdfaed [Josh Rosen] Add SparkFirehoseListener class for consuming all SparkListener events.	2015-02-02 23:35:07 -08:00
Thomas Graves	c31c36c4a7	[SPARK-3778] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs .this was https://github.com/apache/spark/pull/2676 https://issues.apache.org/jira/browse/SPARK-3778 This affects if someone is trying to access secure hdfs something like: val lines = { val hconf = new Configuration() hconf.set("mapred.input.dir", "mydir") hconf.set("textinputformat.record.delimiter","\003432\n") sc.newAPIHadoopRDD(hconf, classOf[TextInputFormat], classOf[LongWritable], classOf[Text]) } Author: Thomas Graves <tgraves@apache.org> Closes #4292 from tgravescs/SPARK-3788 and squashes the following commits: cf3b453 [Thomas Graves] newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn	2015-02-02 22:45:55 -08:00
zsxwing	c306555f49	[SPARK-5219][Core] Add locks to avoid scheduling race conditions Author: zsxwing <zsxwing@gmail.com> Closes #4019 from zsxwing/SPARK-5219 and squashes the following commits: 36a8b4e [zsxwing] Add locks to avoid race conditions	2015-02-02 21:42:18 -08:00
Patrick Wendell	7930d2bef0	SPARK-3996: Add jetty servlet and continuations. These are needed transitively from the other Jetty libraries we include. It was not picked up by unit tests because we disable the UI. Author: Patrick Wendell <patrick@databricks.com> Closes #4323 from pwendell/jetty and squashes the following commits: d8669da [Patrick Wendell] SPARK-3996: Add jetty servlet and continuations.	2015-02-02 21:01:36 -08:00
nemccarthy	cb39f12043	[SPARK-5543][WebUI] Remove unused import JsonUtil from from JsonProtocol Simple PR to Remove unused import JsonUtil from from org.apache.spark.util.JsonProtocol.scala which fails builds with older versions of hadoop-core This import is unused. It was introduced in PR #4029 https://github.com/apache/spark/pull/4029 as a part of JIRA SPARK-5231 Author: nemccarthy <nathan@nemccarthy.me> Closes #4320 from nemccarthy/master and squashes the following commits: 8e34a11 [nemccarthy] [SPARK-5543][WebUI] Remove unused import JsonUtil from from org.apache.spark.util.JsonProtocol.scala which fails builds with older versions of hadoop-core	2015-02-02 20:03:13 -08:00
Davies Liu	0561c45449	[SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python This PR brings the Python API for Spark Streaming Kafka data source. ``` class KafkaUtils(__builtin__.object) \| Static methods defined here: \| \| createStream(ssc, zkQuorum, groupId, topics, storageLevel=StorageLevel(True, True, False, False, 2), keyDecoder=<function utf8_decoder>, valueDecoder=<function utf8_decoder>) \| Create an input stream that pulls messages from a Kafka Broker. \| \| :param ssc: StreamingContext object \| :param zkQuorum: Zookeeper quorum (hostname:port,hostname:port,..). \| :param groupId: The group id for this consumer. \| :param topics: Dict of (topic_name -> numPartitions) to consume. \| Each partition is consumed in its own thread. \| :param storageLevel: RDD storage level. \| :param keyDecoder: A function used to decode key \| :param valueDecoder: A function used to decode value \| :return: A DStream object ``` run the example: ``` bin/spark-submit --driver-class-path external/kafka-assembly/target/scala-/spark-streaming-kafka-assembly-.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test ``` Author: Davies Liu <davies@databricks.com> Author: Tathagata Das <tdas@databricks.com> Closes #3715 from davies/kafka and squashes the following commits: d93bfe0 [Davies Liu] Update make-distribution.sh 4280d04 [Davies Liu] address comments e6d0427 [Davies Liu] Merge branch 'master' of github.com:apache/spark into kafka f257071 [Davies Liu] add tests for null in RDD 23b039a [Davies Liu] address comments 9af51c4 [Davies Liu] Merge branch 'kafka' of github.com:davies/spark into kafka a74da87 [Davies Liu] address comments dc1eed0 [Davies Liu] Update kafka_wordcount.py 31e2317 [Davies Liu] Update kafka_wordcount.py 370ba61 [Davies Liu] Update kafka.py 97386b3 [Davies Liu] address comment 2c567a5 [Davies Liu] update logging and comment 33730d1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into kafka adeeb38 [Davies Liu] Merge pull request #3 from tdas/kafka-python-api aea8953 [Tathagata Das] Kafka-assembly for Python API eea16a7 [Davies Liu] refactor f6ce899 [Davies Liu] add example and fix bugs 98c8d17 [Davies Liu] fix python style 5697a01 [Davies Liu] bypass decoder in scala 048dbe6 [Davies Liu] fix python style 75d485e [Davies Liu] add mqtt 07923c4 [Davies Liu] support kafka in Python	2015-02-02 19:16:27 -08:00
Jacek Lewandowski	cfea30037f	Spark 3883: SSL support for HttpServer and Akka SPARK-3883: SSL support for Akka connections and Jetty based file servers. This story introduced the following changes: - Introduced SSLOptions object which holds the SSL configuration and can build the appropriate configuration for Akka or Jetty. SSLOptions can be created by parsing SparkConf entries at a specified namespace. - SSLOptions is created and kept by SecurityManager - All Akka actor address creation snippets based on interpolated strings were replaced by a dedicated methods from AkkaUtils. Those methods select the proper Akka protocol - whether akka.tcp or akka.ssl.tcp - Added tests cases for AkkaUtils, FileServer, SSLOptions and SecurityManager - Added a way to use node local SSL configuration by executors and driver in standalone mode. It can be done by specifying spark.ssl.useNodeLocalConf in SparkConf. - Made CoarseGrainedExecutorBackend not overwrite the settings which are executor startup configuration - they are passed anyway from Worker Refer to https://github.com/apache/spark/pull/3571 for discussion and details Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Author: Jacek Lewandowski <jacek.lewandowski@datastax.com> Closes #3571 from jacek-lewandowski/SPARK-3883-master and squashes the following commits: 9ef4ed1 [Jacek Lewandowski] Merge pull request #2 from jacek-lewandowski/SPARK-3883-docs2 fb31b49 [Jacek Lewandowski] SPARK-3883: Added SSL setup documentation 2532668 [Jacek Lewandowski] SPARK-3883: Refactored AkkaUtils.protocol method to not use Try 90a8762 [Jacek Lewandowski] SPARK-3883: Refactored methods to resolve Akka address and made it possible to easily configure multiple communication layers for SSL 72b2541 [Jacek Lewandowski] SPARK-3883: A reference to the fallback SSLOptions can be provided when constructing SSLOptions 93050f4 [Jacek Lewandowski] SPARK-3883: SSL support for HttpServer and Akka	2015-02-02 17:27:26 -08:00
Sandy Ryza	830934976e	SPARK-5500. Document that feeding hadoopFile into a shuffle operation wi... ...ll cause problems Author: Sandy Ryza <sandy@cloudera.com> Closes #4293 from sryza/sandy-spark-5500 and squashes the following commits: e9ce742 [Sandy Ryza] Change to warning cc46e52 [Sandy Ryza] Add instructions and extend to NewHadoopRDD 6e1932a [Sandy Ryza] Throw exception on cache 0f6c4eb [Sandy Ryza] SPARK-5500. Document that feeding hadoopFile into a shuffle operation will cause problems	2015-02-02 14:52:46 -08:00
Jacek Lewandowski	5a5526164b	SPARK-5425: Use synchronised methods in system properties to create SparkConf SPARK-5425: Fixed usages of system properties This patch fixes few problems caused by the fact that the Scala wrapper over system properties is not thread-safe and is basically invalid because it doesn't take into account the default values which could have been set in the properties object. The problem is fixed by modifying `Utils.getSystemProperties` method so that it uses `stringPropertyNames` method of the `Properties` class, which is thread-safe (internally it creates a defensive copy in a synchronized method) and returns keys of the properties which were set explicitly and which are defined as defaults. The other related problem, which is fixed here. was in `ResetSystemProperties` mix-in. It created a copy of the system properties in the wrong way. This patch also introduces a test case for thread-safeness of SparkConf creation. Refer to the discussion in https://github.com/apache/spark/pull/4220 for more details. Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4222 from jacek-lewandowski/SPARK-5425-1.3 and squashes the following commits: 03da61b [Jacek Lewandowski] SPARK-5425: Modified Utils.getSystemProperties to return a map of all system properties - explicit + defaults 8faf2ea [Jacek Lewandowski] SPARK-5425: Use SerializationUtils to save properties in ResetSystemProperties trait 71aa572 [Jacek Lewandowski] SPARK-5425: Use synchronised methods in system properties to create SparkConf	2015-02-02 14:07:19 -08:00
Martin Weindel	bff65b5cca	Disabling Utils.chmod700 for Windows This patch makes Spark 1.2.1rc2 work again on Windows. Without it you get following log output on creating a Spark context: INFO org.apache.spark.SparkEnv:59 - Registering BlockManagerMaster ERROR org.apache.spark.util.Utils:75 - Failed to create local root dir in .... Ignoring this directory. ERROR org.apache.spark.storage.DiskBlockManager:75 - Failed to create any local dir. Author: Martin Weindel <martin.weindel@gmail.com> Author: mweindel <m.weindel@usu-software.de> Closes #4299 from MartinWeindel/branch-1.2 and squashes the following commits: 535cb7f [Martin Weindel] fixed last commit f17072e [Martin Weindel] moved condition to caller to avoid confusion on chmod700() return value 4de5e91 [Martin Weindel] reverted to unix line ends fe2740b [mweindel] moved comment ac4749c [mweindel] fixed chmod700 for Windows	2015-02-02 14:01:32 -08:00
Marcelo Vanzin	52f5754f45	Make sure only owner can read / write to directories created for the job. Whenever a directory is created by the utility method, immediately restrict its permissions so that only the owner has access to its contents. Signed-off-by: Josh Rosen <joshrosen@databricks.com>	2015-02-02 14:01:32 -08:00
lianhuiwang	f5e63751f0	[SPARK-5173]support python application running on yarn cluster mode now when we run python application on yarn cluster mode through spark-submit, spark-submit does not support python application on yarn cluster mode. so i modify code of submit and yarn's AM in order to support it. through specifying .py file or primaryResource file via spark-submit, we can make pyspark run in yarn-cluster mode. example:spark-submit --master yarn-master --num-executors 1 --driver-memory 1g --executor-memory 1g xx.py --primaryResource yy.conf this config is same as pyspark on yarn-client mode. firstly,we put local path of .py or primaryResource to yarn's dist.files.that can be distributed on slave nodes.and then in spark-submit we transfer --py-files and --primaryResource to yarn.Client and use "org.apache.spark.deploy.PythonRunner" to user class that can run .py files on ApplicationMaster. in yarn.Client we transfer --py-files and --primaryResource to ApplicationMaster. in ApplicationMaster, user's class is org.apache.spark.deploy.PythonRunner, and user's args is primaryResource and -py-files. so that can make pyspark run on ApplicationMaster. JoshRosen tgravescs sryza Author: lianhuiwang <lianhuiwang09@gmail.com> Author: Wang Lianhui <lianhuiwang09@gmail.com> Closes #3976 from lianhuiwang/SPARK-5173 and squashes the following commits: 28a8a58 [lianhuiwang] fix variable name 67f8cee [lianhuiwang] update with andrewor's comments 0319ae3 [lianhuiwang] address with sryza's comments 2385ef6 [lianhuiwang] address with sryza's comments 03640ab [lianhuiwang] add sparkHome to env 47d2fc3 [lianhuiwang] fix test 2adc8f5 [lianhuiwang] add spark.test.home d60bc60 [lianhuiwang] fix test 5b30064 [lianhuiwang] add test 097a5ec [lianhuiwang] fix line length exceeds 100 905a106 [lianhuiwang] update with sryza and andrewor 's comments f1f55b6 [lianhuiwang] when yarn-cluster, all python files can be non-local 172eec1 [Wang Lianhui] fix a min submit's bug 9c941bc [lianhuiwang] support python application running on yarn cluster mode	2015-02-02 12:32:28 -08:00
Sandy Ryza	b2047b55c5	SPARK-4585. Spark dynamic executor allocation should use minExecutors as... ... initial number Author: Sandy Ryza <sandy@cloudera.com> Closes #4051 from sryza/sandy-spark-4585 and squashes the following commits: d1dd039 [Sandy Ryza] Add spark.dynamicAllocation.initialNumExecutors and make min and max not required b7c59dc [Sandy Ryza] SPARK-4585. Spark dynamic executor allocation should use minExecutors as initial number	2015-02-02 12:27:08 -08:00
Sandy Ryza	6f341310bf	SPARK-5492. Thread statistics can break with older Hadoop versions Author: Sandy Ryza <sandy@cloudera.com> Closes #4305 from sryza/sandy-spark-5492 and squashes the following commits: b7d4497 [Sandy Ryza] SPARK-5492. Thread statistics can break with older Hadoop versions	2015-02-02 00:54:06 -08:00
jerryshao	63dfe21dc7	[SPARK-5478][UI][Minor] Add missing right parentheses ![UI](https://dl.dropboxusercontent.com/u/19230832/Capture.PNG) Author: jerryshao <saisai.shao@intel.com> Closes #4267 from jerryshao/SPARK-5478 and squashes the following commits: 9fe51cc [jerryshao] Add missing right parentheses	2015-02-01 23:56:13 -08:00
Patrick Wendell	a15f6e31fc	[SPARK-3996]: Shade Jetty in Spark deliverables (v2 of this patch with a fix that was only relevant for the maven build). This patch piggy-back's on vanzin's work to simplify the Guava shading, and adds Jetty as a shaded library in Spark. Other than adding Jetty, it consilidates the <artifactSet>'s into the root pom. I found it was a bit easier to follow that way, since you don't need to look into child pom's to find out specific artifact sets included in shading. Author: Patrick Wendell <patrick@databricks.com> Closes #4285 from pwendell/jetty and squashes the following commits: d3e7f4e [Patrick Wendell] Fix for shaded deps causing compile errors 19f0710 [Patrick Wendell] More code review feedback 961452d [Patrick Wendell] Responding to feedback from Marcello 6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables	2015-02-01 21:13:57 -08:00
Tom Panning	1ca0a1014e	[SPARK-5176] The thrift server does not support cluster mode Output an error message if the thrift server is started in cluster mode. Author: Tom Panning <tom.panning@nextcentury.com> Closes #4137 from tpanningnextcen/spark-5176-thrift-cluster-mode-error and squashes the following commits: f5c0509 [Tom Panning] [SPARK-5176] The thrift server does not support cluster mode	2015-02-01 17:57:31 -08:00
zsxwing	883bc88d52	[SPARK-4859][Core][Streaming] Refactor LiveListenerBus and StreamingListenerBus This PR refactors LiveListenerBus and StreamingListenerBus and extracts the common codes to a parent class `ListenerBus`. It also includes bug fixes in #3710: 1. Fix the race condition of queueFullErrorMessageLogged in LiveListenerBus and StreamingListenerBus to avoid outputing `queue-full-error` logs multiple times. 2. Make sure the SHUTDOWN message will be delivered to listenerThread, so that we can make sure listenerThread will always be able to exit. 3. Log the error from listener rather than crashing listenerThread in StreamingListenerBus. During fixing the above bugs, we find it's better to make LiveListenerBus and StreamingListenerBus have the same bahaviors. Then there will be many duplicated codes in LiveListenerBus and StreamingListenerBus. Therefore, I extracted their common codes to `ListenerBus` as a parent class: LiveListenerBus and StreamingListenerBus only need to extend `ListenerBus` and implement `onPostEvent` (how to process an event) and `onDropEvent` (do something when droppping an event). Author: zsxwing <zsxwing@gmail.com> Closes #4006 from zsxwing/SPARK-4859-refactor and squashes the following commits: c8dade2 [zsxwing] Fix the code style after renaming 5715061 [zsxwing] Rename ListenerHelper to ListenerBus and the original ListenerBus to AsynchronousListenerBus f0ef647 [zsxwing] Fix the code style 4e85ffc [zsxwing] Merge branch 'master' into SPARK-4859-refactor d2ef990 [zsxwing] Add private[spark] 4539f91 [zsxwing] Remove final to pass MiMa tests a9dccd3 [zsxwing] Remove SparkListenerShutdown 7cc04c3 [zsxwing] Refactor LiveListenerBus and StreamingListenerBus and make them share same code base	2015-02-01 17:48:41 -08:00
Ryan Williams	80bd715a3e	[SPARK-5422] Add support for sending Graphite metrics via UDP Depends on [SPARK-5413](https://issues.apache.org/jira/browse/SPARK-5413) / #4209, included here, will rebase once the latter's merged. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4218 from ryan-williams/udp and squashes the following commits: ebae393 [Ryan Williams] Add support for sending Graphite metrics via UDP cb58262 [Ryan Williams] bump metrics dependency to v3.1.0	2015-01-31 23:41:05 -08:00
Sean Owen	c84d5a10e8	SPARK-3359 [CORE] [DOCS] `sbt/sbt unidoc` doesn't work with Java 8 These are more `javadoc` 8-related changes I spotted while investigating. These should be helpful in any event, but this does not nearly resolve SPARK-3359, which may never be feasible while using `unidoc` and `javadoc` 8. Author: Sean Owen <sowen@cloudera.com> Closes #4193 from srowen/SPARK-3359 and squashes the following commits: 5b33f66 [Sean Owen] Additional scaladoc fixes for javadoc 8; still not going to be javadoc 8 compatible	2015-01-31 10:40:42 -08:00
Reynold Xin	636408311d	[SPARK-5307] Add a config option for SerializationDebugger. Just in case there is a bug in the SerializationDebugger that makes error reporting worse than it was. Author: Reynold Xin <rxin@databricks.com> Closes #4297 from rxin/ser-config and squashes the following commits: f1d4629 [Reynold Xin] [SPARK-5307] Add a config option for SerializationDebugger.	2015-01-31 00:06:36 -08:00
Reynold Xin	740a56862b	[SPARK-5307] SerializationDebugger This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object. The patch uses the internals of JVM serialization (in particular, heavy usage of ObjectStreamClass). Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc. An example serialization stack: ``` Serialization stack: - object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable2c43caa4) - element of array (index: 0) - array (class [Ljava.lang.Object;, size 1) - field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;) - object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray193c5908) - writeExternal data - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass320bdadc) ``` Author: Reynold Xin <rxin@databricks.com> Closes #4098 from rxin/SerializationDebugger and squashes the following commits: 553b3ff [Reynold Xin] Update SerializationDebuggerSuite.scala 572d0cb [Reynold Xin] Disable automatically when reflection fails. b349b77 [Reynold Xin] [SPARK-5307] SerializationDebugger to help debug NotSerializableException - take 2	2015-01-30 22:34:10 -08:00
Sandy Ryza	254eaa4d35	SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714 Previously I had tried to solve this with by adding a line in Spark's log4j-defaults.properties. The issue with the message in log4j-defaults.properties was that the log4j.properties packaged inside Hadoop was getting picked up instead. While it would be ideal to fix that as well, we still want to quiet this in situations where a user supplies their own custom log4j properties. Author: Sandy Ryza <sandy@cloudera.com> Closes #4192 from sryza/sandy-spark-5393 and squashes the following commits: 4d5dedc [Sandy Ryza] Only set log level if unset 46e07c5 [Sandy Ryza] SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714	2015-01-30 11:31:54 -06:00
Davies Liu	5c746eedda	[SPARK-5395] [PySpark] fix python process leak while coalesce() Currently, the Python process is released into pool only after the task had finished, it cause many process forked if coalesce() is called. This PR will change it to release the process as soon as read all the data from it (finish the partition), then a process could be reused to process multiple partitions in a single task. Author: Davies Liu <davies@databricks.com> Closes #4238 from davies/py_leak and squashes the following commits: ec80a43 [Davies Liu] add @volatile 6da437a [Davies Liu] address comments 24ed322 [Davies Liu] fix python process leak while coalesce()	2015-01-29 17:28:37 -08:00
Patrick Wendell	d2071e8f45	Revert "[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables" This reverts commit `f240fe390b`.	2015-01-29 17:14:27 -08:00
Patrick Wendell	f240fe390b	[WIP] [SPARK-3996]: Shade Jetty in Spark deliverables This patch piggy-back's on vanzin's work to simplify the Guava shading, and adds Jetty as a shaded library in Spark. Other than adding Jetty, it consilidates the \<artifactSet\>'s into the root pom. I found it was a bit easier to follow that way, since you don't need to look into child pom's to find out specific artifact sets included in shading. Author: Patrick Wendell <patrick@databricks.com> Closes #4252 from pwendell/jetty and squashes the following commits: 19f0710 [Patrick Wendell] More code review feedback 961452d [Patrick Wendell] Responding to feedback from Marcello 6df25ca [Patrick Wendell] [WIP] [SPARK-3996]: Shade Jetty in Spark deliverables	2015-01-29 16:31:19 -08:00
Marcelo Vanzin	f9e569452e	[SPARK-5466] Add explicit guava dependencies where needed. One side-effect of shading guava is that it disappears as a transitive dependency. For Hadoop 2.x, this was masked by the fact that Hadoop itself depends on guava. But certain versions of Hadoop 1.x also shade guava, leaving either no guava or some random version pulled by another dependency on the classpath. So be explicit about the dependency in modules that use guava directly, which is the right thing to do anyway. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #4272 from vanzin/SPARK-5466 and squashes the following commits: e3f30e5 [Marcelo Vanzin] Dependency for catalyst is not needed. d3b2c84 [Marcelo Vanzin] [SPARK-5466] Add explicit guava dependencies where needed.	2015-01-29 13:00:45 -08:00
Xiangrui Meng	4ee79c71af	[SPARK-5430] move treeReduce and treeAggregate from mllib to core We have seen many use cases of `treeAggregate`/`treeReduce` outside the ML domain. Maybe it is time to move them to Core. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #4228 from mengxr/SPARK-5430 and squashes the following commits: 20ad40d [Xiangrui Meng] exclude tree* from mima e89a43e [Xiangrui Meng] fix compile and update java doc 3ae1a4b [Xiangrui Meng] add treeReduce/treeAggregate to Python 6f948c5 [Xiangrui Meng] add treeReduce/treeAggregate to JavaRDDLike d600b6c [Xiangrui Meng] move treeReduce and treeAggregate to core	2015-01-28 17:26:03 -08:00
Michael Nazario	e023112d33	[SPARK-5441][pyspark] Make SerDeUtil PairRDD to Python conversions more robust SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD now both support empty RDDs by checking the result of take(1) instead of calling first which throws an exception. Author: Michael Nazario <mnazario@palantir.com> Closes #4236 from mnazario/feature/empty-first and squashes the following commits: a531c0c [Michael Nazario] Added regression tests for SPARK-5441 e3b2fb6 [Michael Nazario] Added acceptance of the empty case	2015-01-28 13:58:46 -08:00
Ryan Williams	a731314c31	[SPARK-5417] Remove redundant executor-id set() call This happens inside SparkEnv initialization as of #4194 Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4213 from ryan-williams/exec-id-set and squashes the following commits: b3e4f7b [Ryan Williams] Remove redundant executor-id set() call	2015-01-28 13:04:52 -08:00
Andrew Or	84b6ecdef6	[SPARK-5437] Fix DriverSuite and SparkSubmitSuite timeout issues In DriverSuite, we currently set a timeout of 60 seconds. If after this time the process has not terminated, we leak the process because we never destroy it. In SparkSubmitSuite, we currently do not have a timeout so the test can hang indefinitely. Author: Andrew Or <andrew@databricks.com> Closes #4230 from andrewor14/fix-driver-suite and squashes the following commits: f5c80fd [Andrew Or] Fix timeout behaviors in both suites 8092c36 [Andrew Or] Stop SparkContext after every individual test	2015-01-28 12:53:22 -08:00
Sean Owen	9b18009b83	SPARK-1934 [CORE] "this" reference escape to "selectorThread" during construction in ConnectionManager This change reshuffles the order of initialization in `ConnectionManager` so that the last thing that happens is running `selectorThread`, which invokes a method that relies on object state in `ConnectionManager` zsxwing also reported a similar problem in `BlockManager` in the JIRA, but I can't find a similar pattern there. Maybe it was subsequently fixed? Author: Sean Owen <sowen@cloudera.com> Closes #4225 from srowen/SPARK-1934 and squashes the following commits: c4dec3b [Sean Owen] Init all object state in ConnectionManager constructor before starting thread in constructor that accesses object's state	2015-01-28 12:44:35 -08:00
Winston Chen	453d7999b8	[SPARK-5361]Multiple Java RDD <-> Python RDD conversions not working correctly This is found through reading RDD from `sc.newAPIHadoopRDD` and writing it back using `rdd.saveAsNewAPIHadoopFile` in pyspark. It turns out that whenever there are multiple RDD conversions from JavaRDD to PythonRDD then back to JavaRDD, the exception below happens: ``` 15/01/16 10:28:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 7) java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.util.ArrayList at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:157) at org.apache.spark.api.python.SerDeUtil$$anonfun$pythonToJava$1$$anonfun$apply$1.apply(SerDeUtil.scala:153) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) ``` The test case code below reproduces it: ``` from pyspark.rdd import RDD dl = [ (u'2', {u'director': u'David Lean'}), (u'7', {u'director': u'Andrew Dominik'}) ] dl_rdd = sc.parallelize(dl) tmp = dl_rdd._to_java_object_rdd() tmp2 = sc._jvm.SerDe.javaToPython(tmp) t = RDD(tmp2, sc) t.count() tmp = t._to_java_object_rdd() tmp2 = sc._jvm.SerDe.javaToPython(tmp) t = RDD(tmp2, sc) t.count() # it blows up here during the 2nd time of conversion ``` Author: Winston Chen <wchen@quid.com> Closes #4146 from wingchen/master and squashes the following commits: 903df7d [Winston Chen] SPARK-5361, update to toSeq based on the PR 5d90a83 [Winston Chen] SPARK-5361, make python pretty, so to pass PEP 8 checks 126be6b [Winston Chen] SPARK-5361, add in test case 4cf1187 [Winston Chen] SPARK-5361, add in test case 9f1a097 [Winston Chen] add in tuple handling while converting form python RDD back to JavaRDD	2015-01-28 11:08:44 -08:00
Kousuke Saruta	0b35fcd7f0	[SPARK-5291][CORE] Add timestamp and reason why an executor is removed to SparkListenerExecutorAdded and SparkListenerExecutorRemoved Recently `SparkListenerExecutorAdded` and `SparkListenerExecutorRemoved` are added. I think it's useful if they have timestamp and the reason why an executor is removed. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4082 from sarutak/SPARK-5291 and squashes the following commits: a026ff2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291 979dfe1 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291 cf9f9080 [Kousuke Saruta] Fixed test case 1f2a89b [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5291 243f2a60 [Kousuke Saruta] Modified MesosSchedulerBackendSuite a527c35 [Kousuke Saruta] Added timestamp to SparkListenerExecutorAdded	2015-01-28 11:02:51 -08:00
Marcelo Vanzin	37a5e272f8	[SPARK-4809] Rework Guava library shading. The current way of shading Guava is a little problematic. Code that depends on "spark-core" does not see the transitive dependency, yet classes in "spark-core" actually depend on Guava. So it's a little tricky to run unit tests that use spark-core classes, since you need a compatible version of Guava in your dependencies when running the tests. This can become a little tricky, and is kind of a bad user experience. This change modifies the way Guava is shaded so that it's applied uniformly across the Spark build. This means Guava is shaded inside spark-core itself, so that the dependency issues above are solved. Aside from that, all Spark sub-modules have their Guava references relocated, so that they refer to the relocated classes now packaged inside spark-core. Before, this was only done by the time the assembly was built, so projects that did not end up inside the assembly (such as streaming backends) could still reference the original location of Guava classes. The Guava classes are added to the "first" artifact Spark generates (network-common), so that all downstream modules have the needed classes available. Since "network-common" is a dependency of spark-core, all Spark apps should get the relocated classes automatically. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3658 from vanzin/SPARK-4809 and squashes the following commits: 3c93e42 [Marcelo Vanzin] Shade Guava in the network-common artifact. 5d69ec9 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 b3104fc [Marcelo Vanzin] Add comment. 941848f [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 f78c48a [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 8053dd4 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 107d7da [Marcelo Vanzin] Add fix for SPARK-5052 (PR #3874). 40b8723 [Marcelo Vanzin] Merge branch 'master' into SPARK-4809 4a4ed42 [Marcelo Vanzin] [SPARK-4809] Rework Guava library shading.	2015-01-28 00:29:29 -08:00
Sandy Ryza	b1b35ca2e4	SPARK-5199. FS read metrics should support CombineFileSplits and track bytes from all FSs ...mbineFileSplits Author: Sandy Ryza <sandy@cloudera.com> Closes #4050 from sryza/sandy-spark-5199 and squashes the following commits: 864514b [Sandy Ryza] Add tests and fix bug 0d504f1 [Sandy Ryza] Prettify 915c7e6 [Sandy Ryza] Get metrics from all filesystems cdbc3e8 [Sandy Ryza] SPARK-5199. Input metrics should show up for InputFormats that return CombineFileSplits	2015-01-27 15:42:55 -08:00
Elmer Garduno	661e0fca5d	[SPARK-5052] Add common/base classes to fix guava methods signatures. Fixes problems with incorrect method signatures related to shaded classes. For discussion see the jira issue. Author: Elmer Garduno <elmerg@google.com> Closes #3874 from elmer-garduno/fix_guava_signatures and squashes the following commits: aa5d8e0 [Elmer Garduno] Unshade common/base[Function\|Supplier] classes to fix guava methods signatures.	2015-01-26 17:40:48 -08:00
Sean Owen	0497ea51ac	SPARK-960 [CORE] [TEST] JobCancellationSuite "two jobs sharing the same stage" is broken This reenables and fixes this test, after addressing two issues: - The Semaphore that was intended to be shared locally was being serialized and copied; it's now a static member in the companion object as in other tests - Later changes to Spark means that cancelling the first task will not cancel the shared stage and therefore the second task should succeed Author: Sean Owen <sowen@cloudera.com> Closes #4180 from srowen/SPARK-960 and squashes the following commits: 43da66f [Sean Owen] Fix 'two jobs sharing the same stage' test and reenable it: truly share a Semaphore locally as intended, and update expectation of failure in non-cancelled task	2015-01-26 14:32:27 -08:00
Sean Owen	54e7b456dd	SPARK-4147 [CORE] Reduce log4j dependency Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework. The only change is to push one half of the check in the original `if` condition inside. This is a trivial change, may or may not actually solve a problem, but I think it's all that makes sense to do for SPARK-4147. Author: Sean Owen <sowen@cloudera.com> Closes #4190 from srowen/SPARK-4147 and squashes the following commits: 4e99942 [Sean Owen] Defer use of log4j class until it's known that log4j 1.2 is being used. This may avoid dealing with log4j dependencies for callers that reroute slf4j to another logging framework.	2015-01-26 14:23:42 -08:00
Davies Liu	142093179a	[SPARK-5355] use j.u.c.ConcurrentHashMap instead of TrieMap j.u.c.ConcurrentHashMap is more battle tested. cc rxin JoshRosen pwendell Author: Davies Liu <davies@databricks.com> Closes #4208 from davies/safe-conf and squashes the following commits: c2182dc [Davies Liu] address comments, fix tests 3a1d821 [Davies Liu] fix test da14ced [Davies Liu] Merge branch 'master' of github.com:apache/spark into safe-conf ae4d305 [Davies Liu] change to j.u.c.ConcurrentMap f8fa1cf [Davies Liu] change to TrieMap a1d769a [Davies Liu] make SparkConf thread-safe	2015-01-26 12:51:32 -08:00
CodingCat	8df9435512	[SPARK-5268] don't stop CoarseGrainedExecutorBackend for irrelevant DisassociatedEvent https://issues.apache.org/jira/browse/SPARK-5268 In CoarseGrainedExecutorBackend, we subscribe DisassociatedEvent in executor backend actor and exit the program upon receive such event... let's consider the following case The user may develop an Akka-based program which starts the actor with Spark's actor system and communicate with an external actor system (e.g. an Akka-based receiver in spark streaming which communicates with an external system) If the external actor system fails or disassociates with the actor within spark's system with purpose, we may receive DisassociatedEvent and the executor is restarted. This is not the expected behavior..... ---- This is a simple fix to check the event before making the quit decision Author: CodingCat <zhunansjtu@gmail.com> Closes #4063 from CodingCat/SPARK-5268 and squashes the following commits: 4d7d48e [CodingCat] simplify the log 18c36f4 [CodingCat] more descriptive log f299e0b [CodingCat] clean log 1632e79 [CodingCat] check whether DisassociatedEvent is relevant before quit	2015-01-25 19:28:53 -08:00
Kay Ousterhout	fc2168f04e	[SPARK-5326] Show fetch wait time as optional metric in the UI With this change, here's what the UI looks like: ![image](https://cloud.githubusercontent.com/assets/1108612/5809994/1ec8a904-9ff4-11e4-8f24-6a59a1a858f7.png) If you want to locally test this, you need to spin up multiple executors, because the shuffle read metrics are only shown for data read remotely. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4110 from kayousterhout/SPARK-5326 and squashes the following commits: 610051e [Kay Ousterhout] Josh style comments 5feaa28 [Kay Ousterhout] What is the difference here?? aa129cb [Kay Ousterhout] Removed inadvertent change 721c742 [Kay Ousterhout] Improved tooltip f3a7111 [Kay Ousterhout] Style fix 679b4e9 [Kay Ousterhout] [SPARK-5326] Show fetch wait time as optional metric in the UI	2015-01-25 16:48:26 -08:00
Kousuke Saruta	8f5c827b01	[SPARK-5344][WebUI] HistoryServer cannot recognize that inprogress file was renamed to completed file `FsHistoryProvider` tries to update application status but if `checkForLogs` is called before `.inprogress` file is renamed to completed file, the file is not recognized as completed. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4132 from sarutak/SPARK-5344 and squashes the following commits: 9658008 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5344 d2c72b6 [Kousuke Saruta] Fixed update issue of FsHistoryProvider	2015-01-25 15:34:20 -08:00
Ryan Williams	aea25482c3	[SPARK-5402] log executor ID at executor-construction time also rename "slaveHostname" to "executorHostname" Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4195 from ryan-williams/exec and squashes the following commits: e60a7bb [Ryan Williams] log executor ID at executor-construction time	2015-01-25 14:20:02 -08:00
Ryan Williams	2d9887bae5	[SPARK-5401] set executor ID before creating MetricsSystem Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #4194 from ryan-williams/metrics and squashes the following commits: 7c5a33f [Ryan Williams] set executor ID before creating MetricsSystem	2015-01-25 14:17:59 -08:00
Idan Zalzberg	412a58e118	Add comment about defaultMinPartitions Added a comment about using math.min for choosing default partition count Author: Idan Zalzberg <idanzalz@gmail.com> Closes #4102 from idanz/patch-2 and squashes the following commits: 50e9d58 [Idan Zalzberg] Update SparkContext.scala	2015-01-25 11:28:05 -08:00
zsxwing	0d1e67ee9b	[SPARK-5214][Test] Add a test to demonstrate EventLoop can be stopped in the event thread Author: zsxwing <zsxwing@gmail.com> Closes #4174 from zsxwing/SPARK-5214-unittest and squashes the following commits: 443e564 [zsxwing] Change the check interval to 5ms 7aaa2d7 [zsxwing] Add a test to demonstrate EventLoop can be stopped in the event thread	2015-01-24 11:00:35 -08:00
Josh Rosen	cef1f092a6	[SPARK-5063] More helpful error messages for several invalid operations This patch adds more helpful error messages for invalid programs that define nested RDDs, broadcast RDDs, perform actions inside of transformations (e.g. calling `count()` from inside of `map()`), and call certain methods on stopped SparkContexts. Currently, these invalid programs lead to confusing NullPointerExceptions at runtime and have been a major source of questions on the mailing list and StackOverflow. In a few cases, I chose to log warnings instead of throwing exceptions in order to avoid any chance that this patch breaks programs that worked "by accident" in earlier Spark releases (e.g. programs that define nested RDDs but never run any jobs with them). In SparkContext, the new `assertNotStopped()` method is used to check whether methods are being invoked on a stopped SparkContext. In some cases, user programs will not crash in spite of calling methods on stopped SparkContexts, so I've only added `assertNotStopped()` calls to methods that always throw exceptions when called on stopped contexts (e.g. by dereferencing a null `dagScheduler` pointer). Author: Josh Rosen <joshrosen@databricks.com> Closes #3884 from JoshRosen/SPARK-5063 and squashes the following commits: a38774b [Josh Rosen] Fix spelling typo a943e00 [Josh Rosen] Convert two exceptions into warnings in order to avoid breaking user programs in some edge-cases. 2d0d7f7 [Josh Rosen] Fix test to reflect 1.2.1 compatibility 3f0ea0c [Josh Rosen] Revert two unintentional formatting changes 8e5da69 [Josh Rosen] Remove assertNotStopped() calls for methods that were sometimes safe to call on stopped SC's in Spark 1.2 8cff41a [Josh Rosen] IllegalStateException fix 6ef68d0 [Josh Rosen] Fix Python line length issues. 9f6a0b8 [Josh Rosen] Add improved error messages to PySpark. 13afd0f [Josh Rosen] SparkException -> IllegalStateException 8d404f3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5063 b39e041 [Josh Rosen] Fix BroadcastSuite test which broadcasted an RDD 99cc09f [Josh Rosen] Guard against calling methods on stopped SparkContexts. 34833e8 [Josh Rosen] Add more descriptive error message. 57cc8a1 [Josh Rosen] Add error message when directly broadcasting RDD. 15b2e6b [Josh Rosen] [SPARK-5063] Useful error messages for nested RDDs and actions inside of transformations	2015-01-23 17:53:15 -08:00
Davies Liu	9bad062268	[SPARK-5355] make SparkConf thread-safe The SparkConf is not thread-safe, but is accessed by many threads. The getAll() could return parts of the configs if another thread is access it. This PR changes SparkConf.settings to a thread-safe TrieMap. Author: Davies Liu <davies@databricks.com> Closes #4143 from davies/safe-conf and squashes the following commits: f8fa1cf [Davies Liu] change to TrieMap a1d769a [Davies Liu] make SparkConf thread-safe	2015-01-21 16:51:54 -08:00
wangfei	3be2a887bf	[SPARK-4984][CORE][WEBUI] Adding a pop-up containing the full job description when it is very long In some case the job description will be very long, such as a long sql. refer to #3718 This PR add a pop-up for job description when it is long. ![image](https://cloud.githubusercontent.com/assets/7018048/5847400/c757cbbc-a207-11e4-891f-528821c2e68d.png) ![image](https://cloud.githubusercontent.com/assets/7018048/5847409/d434b2b4-a207-11e4-8813-03a74b43d766.png) Author: wangfei <wangfei1@huawei.com> Closes #3819 from scwf/popup-descrip-ui and squashes the following commits: ba02b83 [wangfei] address comments a7c5e7b [wangfei] spot that it's been truncated fbf6162 [wangfei] Merge branch 'master' into popup-descrip-ui 0bca96d [wangfei] remove no use val 4b55c3b [wangfei] fix style issue 353c6f4 [wangfei] pop up the description of job with a styled read-only text form field	2015-01-21 15:27:42 -08:00
Sandy Ryza	2eeada373e	SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnA... ...llocator The goal of this PR is to simplify YarnAllocator as much as possible and get it up to the level of code quality we see in the rest of Spark. In service of this, it does a few things: * Uses AMRMClient APIs for matching containers to requests. * Adds calls to AMRMClient.removeContainerRequest so that, when we use a container, we don't end up requesting it again. * Removes YarnAllocator's host->rack cache. YARN's RackResolver already does this caching, so this is redundant. * Adds tests for basic YarnAllocator functionality. * Breaks up the allocateResources method, which was previously nearly 300 lines. * A little bit of stylistic cleanup. * Fixes a bug that causes three times the requests to be filed when preferred host locations are given. The patch is lossy. In particular, it loses the logic for trying to avoid containers bunching up on nodes. As I understand it, the logic that's gone is: * If, in a single response from the RM, we receive a set of containers on a node, and prefer some number of containers on that node greater than 0 but less than the number we received, give back the delta between what we preferred and what we received. This seems like a weird way to avoid bunching E.g. it does nothing to avoid bunching when we don't request containers on particular nodes. Author: Sandy Ryza <sandy@cloudera.com> Closes #3765 from sryza/sandy-spark-1714 and squashes the following commits: 32a5942 [Sandy Ryza] Muffle RackResolver logs 74f56dd [Sandy Ryza] Fix a couple comments and simplify requestTotalExecutors 60ea4bd [Sandy Ryza] Fix scalastyle ca35b53 [Sandy Ryza] Simplify further e9cf8a6 [Sandy Ryza] Fix YarnClusterSuite 257acf3 [Sandy Ryza] Remove locality stuff and more cleanup 59a3c5e [Sandy Ryza] Take out rack stuff 5f72fd5 [Sandy Ryza] Further documentation and cleanup 89edd68 [Sandy Ryza] SPARK-1714. Take advantage of AMRMClient APIs to simplify logic in YarnAllocator	2015-01-21 10:31:54 -06:00
WangTao	8c06a5faac	[SPARK-5336][YARN]spark.executor.cores must not be less than spark.task.cpus https://issues.apache.org/jira/browse/SPARK-5336 Author: WangTao <barneystinson@aliyun.com> Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #4123 from WangTaoTheTonic/SPARK-5336 and squashes the following commits: 6c9676a [WangTao] Update ClientArguments.scala 9632d3a [WangTaoTheTonic] minor comment fix d03d6fa [WangTaoTheTonic] import ordering should be alphabetical' 3112af9 [WangTao] spark.executor.cores must not be less than spark.task.cpus	2015-01-21 09:42:30 -06:00
Kousuke Saruta	9a151ce58b	[SPARK-5294][WebUI] Hide tables in AllStagePages for "Active Stages, Completed Stages and Failed Stages" when they are empty Related to SPARK-5228 and #4028, `AllStagesPage` also should hide the table for `ActiveStages`, `CompleteStages` and `FailedStages` when they are empty. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4083 from sarutak/SPARK-5294 and squashes the following commits: a7625c1 [Kousuke Saruta] Fixed conflicts	2015-01-20 16:40:46 -08:00
Kousuke Saruta	769aced9e7	[SPARK-5329][WebUI] UIWorkloadGenerator should stop SparkContext. UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and try to watch the result at WebUI but Jobs are marked as finished. It's because SparkContext is not stopped. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4112 from sarutak/SPARK-5329 and squashes the following commits: bcc0fa9 [Kousuke Saruta] Disabled scalastyle for a bock comment 86a3b95 [Kousuke Saruta] Fixed UIWorkloadGenerator to stop SparkContext in it	2015-01-20 12:40:55 -08:00
Jacek Lewandowski	c93a57f0d6	SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840... ... by Piotr Kolaczkowski) Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #4113 from jacek-lewandowski/SPARK-4660-master and squashes the following commits: a5e84ca [Jacek Lewandowski] SPARK-4660: Use correct class loader in JavaSerializer (copy of PR #3840 by Piotr Kolaczkowski)	2015-01-20 12:38:01 -08:00
Jongyoul Lee	9d9294aebf	[SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString (This fixes a bug introduced in #3849 / SPARK-4014) Author: Jongyoul Lee <jongyoul@gmail.com> Closes #4119 from jongyoul/SPARK-5333 and squashes the following commits: c6693a8 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - changed logDebug location 4141f58 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Added license information 2190606 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Adjusted imported libraries b7f5517 [Jongyoul Lee] [SPARK-5333][Mesos] MesosTaskLaunchData occurs BufferUnderflowException - Rewind ByteBuffer before making ByteString	2015-01-20 10:18:10 -08:00
Sean Owen	306ff187af	SPARK-5270 [CORE] Provide isEmpty() function in RDD API Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know. Author: Sean Owen <sowen@cloudera.com> Closes #4074 from srowen/SPARK-5270 and squashes the following commits: 66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code 2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike 28395ff [Sean Owen] Add isEmpty to Java, Python 7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()	2015-01-19 22:50:45 -08:00
zsxwing	e69fb8c75a	[SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread. Author: zsxwing <zsxwing@gmail.com> Closes #4016 from zsxwing/event-loop and squashes the following commits: aefa1ce [zsxwing] Add protected to on*** methods 5cfac83 [zsxwing] Remove null check of eventProcessLoop dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException 460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests 227bf33 [zsxwing] Add a stop flag and some tests 37f79c6 [zsxwing] Fix docs 55fb6f6 [zsxwing] Add private[spark] to EventLoop 1f73eac [zsxwing] Fix the import order 3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop	2015-01-19 18:15:51 -08:00
Jongyoul Lee	4a4f9ccba2	[SPARK-5088] Use spark-class for running executors directly Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3897 from jongyoul/SPARK-5088 and squashes the following commits: 8232aa8 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Added a listenerBus for fixing test cases 932289f [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Rebased from master 613cb47 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Fixed code if spark.executor.uri doesn't have any value - Added test cases ff57bda [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Adjusted orders of import 97e4bd4 [Jongyoul Lee] [SPARK-5088] Use spark-class for running executors directly - Changed command for using spark-class directly - Delete sbin/spark-executor and moved some codes into spark-class' case statement	2015-01-19 02:01:56 -08:00
Ilya Ganelin	3453d578ad	[SPARK-3288] All fields in TaskMetrics should be private and use getters/setters I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #4020 from ilganeli/SPARK-3288 and squashes the following commits: 39f3810 [Ilya Ganelin] resolved merge issues e446287 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 b8c05cb [Ilya Ganelin] Missed making a variable private 6444391 [Ilya Ganelin] Made inc/dec functions private[spark] 1149e78 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 26b312b [Ilya Ganelin] Debugging tests 17146c2 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-3288 5525c20 [Ilya Ganelin] Completed refactoring to make vars in TaskMetrics class private c64da4f [Ilya Ganelin] Partially updated task metrics to make some vars private	2015-01-19 01:32:36 -08:00
Prashant Sharma	851b6a9bba	SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage. ![screenshot from 2015-01-16 13 43 25](https://cloud.githubusercontent.com/assets/992952/5773256/d61df300-9d85-11e4-9b5a-6730058839fa.png) This is a first step towards having time remaining estimates for queued and running jobs. See SPARK-5216 Author: Prashant Sharma <prashant.s@imaginea.com> Closes #4043 from ScrapCodes/SPARK-5216/5217-show-waiting-stages and squashes the following commits: 3b11803 [Prashant Sharma] Review feedback. 0992842 [Prashant Sharma] Switched to Linked hashmap, changed the order to active->pending->completed->failed. And changed pending stages to not reverse sort. c19d82a [Prashant Sharma] SPARK-5217 Spark UI should report pending stages during job execution on AllStagesPage.	2015-01-19 01:28:42 -08:00
Kousuke Saruta	ecf943d353	[WebUI] Fix collapse of WebUI layout When we decrease the width of browsers, the header of WebUI wraps and collapses like as following image. ![2015-01-11 19 49 37](https://cloud.githubusercontent.com/assets/4736016/5698887/b0b9aeee-99cd-11e4-9020-08f3f0014de0.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3995 from sarutak/fixed-collapse-webui-layout and squashes the following commits: 3e60b5b [Kousuke Saruta] Modified line-height property in webui.css 7bfb5fb [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fixed-collapse-webui-layout 5d83e18 [Kousuke Saruta] Fixed collapse of WebUI layout	2015-01-16 12:19:08 -08:00
Kousuke Saruta	e8422c521b	[SPARK-5231][WebUI] History Server shows wrong job submission time. History Server doesn't show collect job submission time. It's because `JobProgressListener` updates job submission time every time `onJobStart` method is invoked from `ReplayListenerBus`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4029 from sarutak/SPARK-5231 and squashes the following commits: 0af9e22 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231 da8bd14 [Kousuke Saruta] Made submissionTime in SparkListenerJobStartas and completionTime in SparkListenerJobEnd as regular Long 0412a6a [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5231 26b9b99 [Kousuke Saruta] Fixed the test cases 2d47bd3 [Kousuke Saruta] Fixed to record job submission time and completion time collectly	2015-01-16 10:05:11 -08:00
Ye Xianjin	e200ac8e53	[SPARK-5201][CORE] deal with int overflow in the ParallelCollectionRDD.slice method There is an int overflow in the ParallelCollectionRDD.slice method. That's originally reported by SaintBacchus. ``` sc.makeRDD(1 to (Int.MaxValue)).count // result = 0 sc.makeRDD(1 to (Int.MaxValue - 1)).count // result = 2147483646 = Int.MaxValue - 1 sc.makeRDD(1 until (Int.MaxValue)).count // result = 2147483646 = Int.MaxValue - 1 ``` see https://github.com/apache/spark/pull/2874 for more details. This pr try to fix the overflow. However, There's another issue I don't address. ``` val largeRange = Int.MinValue to Int.MaxValue largeRange.length // throws java.lang.IllegalArgumentException: -2147483648 to 2147483647 by 1: seqs cannot contain more than Int.MaxValue elements. ``` So, the range we feed to sc.makeRDD cannot contain more than Int.MaxValue elements. This is the limitation of Scala. However I think we may want to support that kind of range. But the fix is beyond this pr. srowen andrewor14 would you mind take a look at this pr? Author: Ye Xianjin <advancedxy@gmail.com> Closes #4002 from advancedxy/SPARk-5201 and squashes the following commits: 96265a1 [Ye Xianjin] Update slice method comment and some responding docs. e143d7a [Ye Xianjin] Update inclusive range check for splitting inclusive range. b3f5577 [Ye Xianjin] We can include the last element in the last slice in general for inclusive range, hence eliminate the need to check Int.MaxValue or Int.MinValue. 7d39b9e [Ye Xianjin] Convert the two cases pattern matching to one case. 651c959 [Ye Xianjin] rename sign to needsInclusiveRange. add some comments 196f8a8 [Ye Xianjin] Add test cases for ranges end with Int.MaxValue or Int.MinValue e66e60a [Ye Xianjin] Deal with inclusive and exclusive ranges in one case. If the range is inclusive and the end of the range is (Int.MaxValue or Int.MinValue), we should use inclusive range instead of exclusive	2015-01-16 09:21:33 -08:00
WangTaoTheTonic	2be82b1e66	[SPARK-1507][YARN]specify # cores for ApplicationMaster Based on top of changes in https://github.com/apache/spark/pull/3806. https://issues.apache.org/jira/browse/SPARK-1507 `--driver-cores` and `spark.driver.cores` for all cluster modes and `spark.yarn.am.cores` for yarn client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #4018 from WangTaoTheTonic/SPARK-1507 and squashes the following commits: 01419d3 [WangTaoTheTonic] amend the args name b255795 [WangTaoTheTonic] indet thing d86557c [WangTaoTheTonic] some comments amend 43c9392 [WangTao] fix compile error b39a100 [WangTao] specify # cores for ApplicationMaster	2015-01-16 09:16:56 -08:00
Kostas Sakellis	a79a9f923c	[SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds When calculating the input metrics there was an assumption that one task only reads from one block - this is not true for some operations including coalesce. This patch simply increments the task's input metrics if previous ones existed of the same read method. A limitation to this patch is that if a task reads from two different blocks of different read methods, one will override the other. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3120 from ksakellis/kostas-spark-4092 and squashes the following commits: 54e6658 [Kostas Sakellis] Drops metrics if conflicting read methods exist f0e0cc5 [Kostas Sakellis] Add bytesReadCallback to InputMetrics a2a36d4 [Kostas Sakellis] CR feedback 5a0c770 [Kostas Sakellis] [SPARK-4092] [CORE] Fix InputMetrics for coalesce'd Rdds	2015-01-15 18:48:39 -08:00
Kostas Sakellis	96c2c714f4	[SPARK-4857] [CORE] Adds Executor membership events to SparkListener Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming. In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3711 from ksakellis/kostas-spark-4857 and squashes the following commits: 946d2c5 [Kostas Sakellis] Added executorAdded/Removed events to MesosSchedulerBackend b1d054a [Kostas Sakellis] Remove executorInfo from ExecutorRemoved event 1727b38 [Kostas Sakellis] Renamed ExecutorDetails back to ExecutorInfo and other CR feedback 14fe78d [Kostas Sakellis] Added executor added/removed events to json protocol 93d087b [Kostas Sakellis] [SPARK-4857] [CORE] Adds Executor membership events to SparkListener	2015-01-15 17:53:42 -08:00
Kousuke Saruta	65858ba555	[Minor] Fix tiny typo in BlockManager In BlockManager, there is a word `BlockTranserService` but I think it's typo for `BlockTransferService`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4046 from sarutak/fix-tiny-typo and squashes the following commits: a3e2a2f [Kousuke Saruta] Fixed tiny typo in BlockManager	2015-01-15 17:07:44 -08:00
Josh Rosen	259936be71	[SPARK-4014] Add TaskContext.attemptNumber and deprecate TaskContext.attemptId `TaskContext.attemptId` is misleadingly-named, since it currently returns a taskId, which uniquely identifies a particular task attempt within a particular SparkContext, instead of an attempt number, which conveys how many times a task has been attempted. This patch deprecates `TaskContext.attemptId` and add `TaskContext.taskId` and `TaskContext.attemptNumber` fields. Prior to this change, it was impossible to determine whether a task was being re-attempted (or was a speculative copy), which made it difficult to write unit tests for tasks that fail on early attempts or speculative tasks that complete faster than original tasks. Earlier versions of the TaskContext docs suggest that `attemptId` behaves like `attemptNumber`, so there's an argument to be made in favor of changing this method's implementation. Since we've decided against making that change in maintenance branches, I think it's simpler to add better-named methods and retain the old behavior for `attemptId`; if `attemptId` behaved differently in different branches, then this would cause confusing build-breaks when backporting regression tests that rely on the new `attemptId` behavior. Most of this patch is fairly straightforward, but there is a bit of trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to encode the attemptId, I packed it into the `data` field alongside the task binary. Author: Josh Rosen <joshrosen@databricks.com> Closes #3849 from JoshRosen/SPARK-4014 and squashes the following commits: 89d03e0 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014 5cfff05 [Josh Rosen] Introduce wrapper for serializing Mesos task launch data. 38574d4 [Josh Rosen] attemptId -> taskAttemptId in PairRDDFunctions a180b88 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014 1d43aa6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014 eee6a45 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4014 0b10526 [Josh Rosen] Use putInt instead of putLong (silly mistake) 8c387ce [Josh Rosen] Use local with maxRetries instead of local-cluster. cbe4d76 [Josh Rosen] Preserve attemptId behavior and deprecate it: b2dffa3 [Josh Rosen] Address some of Reynold's minor comments 9d8d4d1 [Josh Rosen] Doc typo 1e7a933 [Josh Rosen] [SPARK-4014] Change TaskContext.attemptId to return attempt number instead of task ID. fd515a5 [Josh Rosen] Add failing test for SPARK-4014	2015-01-14 11:45:40 -08:00
Kousuke Saruta	9d4449c4b3	[SPARK-5228][WebUI] Hide tables for "Active Jobs/Completed Jobs/Failed Jobs" when they are empty In current WebUI, tables for Active Stages, Completed Stages, Skipped Stages and Failed Stages are hidden when they are empty while tables for Active Jobs, Completed Jobs and Failed Jobs are not hidden though they are empty. This is before my patch is applied. ![2015-01-13 14 13 03](https://cloud.githubusercontent.com/assets/4736016/5730793/2b73d6f4-9b32-11e4-9a24-1784d758c644.png) And this is after my patch is applied. ![2015-01-13 14 38 13](https://cloud.githubusercontent.com/assets/4736016/5730797/359ea2da-9b32-11e4-97b0-544739ddbf4c.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #4028 from sarutak/SPARK-5228 and squashes the following commits: b1e6e8b [Kousuke Saruta] Fixed a small typo daab563 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5228 9493a1d [Kousuke Saruta] Modified AllJobPage.scala so that hide Active Jobs/Completed Jobs/Failed Jobs when they are empty	2015-01-14 11:10:29 -08:00
WangTaoTheTonic	9dea64e53a	[SPARK-4697][YARN]System properties should override environment variables I found some arguments in yarn module take environment variables before system properties while the latter override the former in core module. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #3557 from WangTaoTheTonic/SPARK4697 and squashes the following commits: 836b9ef [WangTaoTheTonic] fix type mismatch e3e486a [WangTaoTheTonic] remove the comma 1262d57 [WangTaoTheTonic] handle spark.app.name and SPARK_YARN_APP_NAME in SparkSubmitArguments bee9447 [WangTaoTheTonic] wrong brace 81833bb [WangTaoTheTonic] rebase 40934b4 [WangTaoTheTonic] just switch blocks 5f43f45 [WangTao] System property can override environment variable	2015-01-13 09:50:14 -08:00
WangTaoTheTonic	f7741a9a72	[SPARK-5006][Deploy]spark.port.maxRetries doesn't work https://issues.apache.org/jira/browse/SPARK-5006 I think the issue is produced in https://github.com/apache/spark/pull/1777. Not digging mesos's backend yet. Maybe should add same logic either. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits: 8cdf96d [WangTao] indent thing 2d86d65 [WangTaoTheTonic] fix line length 7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor 61a370d [WangTaoTheTonic] some minor fixes bc6e1ec [WangTaoTheTonic] rebase 67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg 29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable 396c226 [WangTaoTheTonic] make the grammar more like scala 191face [WangTaoTheTonic] invalid value name 62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work	2015-01-13 09:29:25 -08:00
Michael Armbrust	a3978f3e15	[SPARK-5078] Optionally read from SPARK_LOCAL_HOSTNAME Current spark lets you set the ip address using SPARK_LOCAL_IP, but then this is given to akka after doing a reverse DNS lookup. This makes it difficult to run spark in Docker. You can already change the hostname that is used programmatically, but it would be nice to be able to do this with an environment variable as well. Author: Michael Armbrust <michael@databricks.com> Closes #3893 from marmbrus/localHostnameEnv and squashes the following commits: 85045b6 [Michael Armbrust] Optionally read from SPARK_LOCAL_HOSTNAME	2015-01-12 11:57:59 -08:00
lianhuiwang	ef9224e080	[SPARK-5102][Core]subclass of MapStatus needs to be registered with Kryo CompressedMapStatus and HighlyCompressedMapStatus needs to be registered with Kryo, because they are subclass of MapStatus. Author: lianhuiwang <lianhuiwang09@gmail.com> Closes #4007 from lianhuiwang/SPARK-5102 and squashes the following commits: 9d2238a [lianhuiwang] remove register of MapStatus 05a285d [lianhuiwang] subclass of MapStatus needs to be registered with Kryo	2015-01-12 10:57:12 -08:00
zsxwing	6942b974ad	[SPARK-4951][Core] Fix the issue that a busy executor may be killed A few changes to fix this issue: 1. Handle the case that receiving `SparkListenerTaskStart` before `SparkListenerBlockManagerAdded`. 2. Don't add `executorId` to `removeTimes` when the executor is busy. 3. Use `HashMap.retain` to safely traverse the HashMap and remove items. 4. Use the same lock in ExecutorAllocationManager and ExecutorAllocationListener to fix the race condition in `totalPendingTasks`. 5. Move the blocking codes out of the message processing code in YarnSchedulerActor. Author: zsxwing <zsxwing@gmail.com> Closes #3783 from zsxwing/SPARK-4951 and squashes the following commits: d51fa0d [zsxwing] Add comments 2e365ce [zsxwing] Remove expired executors from 'removeTimes' and add idle executors back when a new executor joins 49f61a9 [zsxwing] Eliminate duplicate executor registered warnings d4c4e9a [zsxwing] Minor fixes for the code style 05f6238 [zsxwing] Move the blocking codes out of the message processing code 105ba3a [zsxwing] Fix the race condition in totalPendingTasks d5c615d [zsxwing] Fix the issue that a busy executor may be killed	2015-01-11 16:23:28 -08:00
lewuathe	1656aae2b4	[SPARK-5073] spark.storage.memoryMapThreshold have two default value Because major OS page sizes is about 4KB, the default value of spark.storage.memoryMapThreshold is integrated to 2 * 4096 Author: lewuathe <lewuathe@me.com> Closes #3900 from Lewuathe/integrate-memoryMapThreshold and squashes the following commits: e417acd [lewuathe] [SPARK-5073] Update docs/configuration 834aba4 [lewuathe] [SPARK-5073] Fix style adcea33 [lewuathe] [SPARK-5073] Integrate memory map threshold to 2MB fcce2e5 [lewuathe] [SPARK-5073] spark.storage.memoryMapThreshold have two default value	2015-01-11 13:50:42 -08:00
wangfei	92d9a704ce	[SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sql Author: wangfei <wangfei1@huawei.com> Closes #3718 from scwf/sparksqlui and squashes the following commits: e0d6b5d [wangfei] format fix 383b505 [wangfei] fix conflicts 4d2038a [wangfei] using setJobDescription df79837 [wangfei] fix compile error 92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql	2015-01-10 17:04:56 -08:00
mcheah	e0f28e010c	[SPARK-4737] Task set manager properly handles serialization errors Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set manager now catches the error and aborts the stage. If the TaskSetManager throws a TaskNotSerializableException, the TaskSchedulerImpl will return an empty list of task descriptions, because no tasks were started. The scheduler should abort the stage gracefully. Note that I'm not too familiar with this part of the codebase and its place in the overall architecture of the Spark stack. If implementing it this way will have any averse side effects please voice that loudly. Author: mcheah <mcheah@palantir.com> Closes #3638 from mccheah/task-set-manager-properly-handle-ser-err and squashes the following commits: 1545984 [mcheah] Some more style fixes from Andrew Or. 5267929 [mcheah] Fixing style suggestions from Andrew Or. dfa145b [mcheah] Fixing style from Josh Rosen's feedback b2a430d [mcheah] Not returning empty seq when a task set cannot be serialized. 94844d7 [mcheah] Fixing compilation error, one brace too many 5f486f4 [mcheah] Adding license header for fake task class bf5e706 [mcheah] Fixing indentation. 097e7a2 [mcheah] [SPARK-4737] Catching task serialization exception in TaskSetManager	2015-01-09 14:16:20 -08:00
WangTaoTheTonic	e966452060	[SPARK-1953][YARN]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: 1. `spark.yarn.am.memory` in SparkConf or System Properties 2. default value 512m Note: this arguments is only available in yarn-client mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following commits: d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes 6c1b264 [WangTaoTheTonic] rebase b8410c0 [WangTaoTheTonic] minor optiminzation ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some improvements 3bf70cc [WangTaoTheTonic] rebase and give proper hint 987b99d [WangTaoTheTonic] disable --driver-memory in client mode 2b27928 [WangTaoTheTonic] inaccurate description b7acbb2 [WangTaoTheTonic] incorrect method invoked 2557c5e [WangTaoTheTonic] missing a single blank 42075b0 [WangTaoTheTonic] arrange the args and warn logging 69c7dba [WangTaoTheTonic] rebase 1960d16 [WangTaoTheTonic] fix wrong comment 7fa9e2e [WangTaoTheTonic] log a warning f6bee0e [WangTaoTheTonic] docs issue d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181 b09c309 [WangTaoTheTonic] use code format ab16bb5 [WangTaoTheTonic] fix bug and add comments 44e48c2 [WangTaoTheTonic] minor fix 6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs 0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size is same as driver memory size	2015-01-09 13:23:13 -08:00
Kay Ousterhout	b6aa557300	[SPARK-1143] Separate pool tests into their own suite. The current TaskSchedulerImplSuite includes some tests that are actually for the TaskSchedulerImpl, but the remainder of the tests avoid using the TaskSchedulerImpl entirely, and actually test the pool and scheduling algorithm mechanisms. This commit separates the pool/scheduling algorithm tests into their own suite, and also simplifies those tests. The pull request replaces #339. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3967 from kayousterhout/SPARK-1143 and squashes the following commits: 8a898c4 [Kay Ousterhout] [SPARK-1143] Separate pool tests into their own suite.	2015-01-09 09:47:06 -08:00
Marcelo Vanzin	48cecf673c	[SPARK-4048] Enhance and extend hadoop-provided profile. This change does a few things to make the hadoop-provided profile more useful: - Create new profiles for other libraries / services that might be provided by the infrastructure - Simplify and fix the poms so that the profiles are only activated while building assemblies. - Fix tests so that they're able to run when the profiles are activated - Add a new env variable to be used by distributions that use these profiles to provide the runtime classpath for Spark jobs and daemons. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #2982 from vanzin/SPARK-4048 and squashes the following commits: 82eb688 [Marcelo Vanzin] Add a comment. eb228c0 [Marcelo Vanzin] Fix borked merge. 4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to child processes. 371ebee [Marcelo Vanzin] Review feedback. 52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 322f882 [Marcelo Vanzin] Fix merge fail. f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048 9640503 [Marcelo Vanzin] Cleanup child process log message. 115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with another pom). e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile. 7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles. 1be73d4 [Marcelo Vanzin] Restore flume-provided profile. d1399ed [Marcelo Vanzin] Restore jetty dependency. 82a54b9 [Marcelo Vanzin] Remove unused profile. 5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided profiles. 1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver. f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list. 9e4e001 [Marcelo Vanzin] Remove duplicate hive profile. d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log. 4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn. 417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH". 2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during testing. 1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects. 284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.	2015-01-08 17:15:13 -08:00
Kousuke Saruta	a00af6bec5	[SPARK-4973][CORE] Local directory in the driver of client-mode continues remaining even if application finished when external shuffle is enabled When we enables external shuffle service, local directories in the driver of client-mode continue remaining even if application has finished. I think local directories for drivers should be deleted. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3811 from sarutak/SPARK-4973 and squashes the following commits: ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory if it's the driver 43770da [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973 88feecd [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4973 d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local directories of the driver of local-mode when external shuffle service is enabled	2015-01-08 13:43:09 -08:00
Eric Moyer	538f221627	Document that groupByKey will OOM for large keys This pull request is my own work and I license it under Spark's open-source license. This contribution is an improvement to the documentation. I documented that the maximum number of values per key for groupByKey is limited by available RAM (see [Datablox][datablox link] and [the spark mailing list][list link]). Just saying that better performance is available is not sufficient. Sometimes you need to do a group-by - your operation needs all the items available in order to complete. This warning explains the problem. [datablox link]: http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html [list link]: http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-RDD-GroupBy-OutOfMemory-Exceptions-tp11427p11466.html Author: Eric Moyer <eric_moyer@yahoo.com> Closes #3936 from RadixSeven/better-group-by-docs and squashes the following commits: 5b6f4e9 [Eric Moyer] groupByKey docs naming updates 238e81b [Eric Moyer] Doc that groupByKey will OOM for large keys	2015-01-08 11:55:23 -08:00
Kousuke Saruta	0a597276db	[Minor] Fix the value represented by spark.executor.id for consistency. The property `spark.executor.id` can represent both `driver` and `<driver>` for one driver. It's inconsistent. This issue is minor so I didn't file this in JIRA. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3812 from sarutak/fix-driver-identifier and squashes the following commits: d885498 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-driver-identifier 4275663 [Kousuke Saruta] Fixed the value represented by spark.executor.id of local mode	2015-01-08 11:35:56 -08:00
Zhang, Liye	06dc4b5206	[SPARK-4989][CORE] avoid wrong eventlog conf cause cluster down in standalone mode when enabling eventlog in standalone mode, if give the wrong configuration, the standalone cluster will down (cause master restart, lose connection with workers). How to reproduce: just give an invalid value to "spark.eventLog.dir", for example: spark.eventLog.dir=hdfs://tmp/logdir1, hdfs://tmp/logdir2. This will throw illegalArgumentException, which will cause the Master restart. And the whole cluster is not available. Author: Zhang, Liye <liye.zhang@intel.com> Closes #3824 from liyezhang556520/wrongConf4Cluster and squashes the following commits: 3c24d98 [Zhang, Liye] revert change with logwarning and excetption for FileNotFoundException 3c1ac2e [Zhang, Liye] change var to val a49c52f [Zhang, Liye] revert wrong modification 12eee85 [Zhang, Liye] add more message in log and on webUI 5c1fa33 [Zhang, Liye] cache exceptions when eventlog with wrong conf	2015-01-08 10:40:26 -08:00
zsxwing	2b729d2250	[SPARK-5126][Core] Verify Spark urls before creating Actors so that invalid urls can crash the process. Because `actorSelection` will return `deadLetters` for an invalid path, Worker keeps quiet for an invalid master url. It's better to log an error so that people can find such problem quickly. This PR will check the url before sending to `actorSelection`, throw and log a SparkException for an invalid url. Author: zsxwing <zsxwing@gmail.com> Closes #3927 from zsxwing/SPARK-5126 and squashes the following commits: 9d429ee [zsxwing] Create a utility method in Utils to parse Spark url; verify urls before creating Actors so that invalid urls can crash the process. 8286e51 [zsxwing] Check the url before sending to Akka and log the error if the url is invalid	2015-01-07 23:01:30 -08:00
hushan[胡珊]	d345ebebd5	[SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson SPARK-5132: stageInfoToJson: Stage Attempt Id stageInfoFromJson: Attempt Id Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3932 from suyanNone/json-stage and squashes the following commits: 41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson	2015-01-07 12:09:12 -08:00
Masayoshi TSUZUKI	6e74edeca3	[SPARK-2458] Make failed application log visible on History Server Enabled HistoryServer to show incomplete applications. We can see the log for incomplete applications by clicking the bottom link. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #3467 from tsudukim/feature/SPARK-2458-2 and squashes the following commits: 76205d2 [Masayoshi TSUZUKI] Fixed and added test code. 29a04a9 [Masayoshi TSUZUKI] Merge branch 'master' of github.com:tsudukim/spark into feature/SPARK-2458-2 f9ef854 [Masayoshi TSUZUKI] Added space between "if" and "(". Fixed "Incomplete" as capitalized in the web UI. Modified double negative variable name. 9b465b0 [Masayoshi TSUZUKI] Modified typo and better implementation. 3ed8a41 [Masayoshi TSUZUKI] Modified too long lines. 08ea14d [Masayoshi TSUZUKI] [SPARK-2458] Make failed application log visible on History Server	2015-01-07 07:32:53 -08:00
Sean Owen	4cba6eb420	SPARK-4159 [CORE] Maven build doesn't run JUnit test suites This PR: - Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar) - Tells `surefire` to test only Java tests - Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication. For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run. Author: Sean Owen <sowen@cloudera.com> Closes #3651 from srowen/SPARK-4159 and squashes the following commits: 2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete 12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit. e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent	2015-01-06 12:02:08 -08:00
Reynold Xin	bbcba3a943	[SPARK-5093] Set spark.network.timeout to 120s consistently. Author: Reynold Xin <rxin@databricks.com> Closes #3903 from rxin/timeout-120 and squashes the following commits: 7c2138e [Reynold Xin] [SPARK-5093] Set spark.network.timeout to 120s consistently.	2015-01-05 15:19:53 -08:00
Jongyoul Lee	1c0e7ce056	[SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environme... ...nt at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask - See the Jira Issue for more details. Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3741 from jongyoul/SPARK-4465 and squashes the following commits: 46ad71e [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed unused import 3d6631f [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - Removed comments and adjusted indentations 2343f13 [Jongyoul Lee] [SPARK-4465] runAsSparkUser doesn't affect TaskRunner in Mesos environment at all. - fixed a scope of runAsSparkUser from MesosExecutorDriver.run to MesosExecutorBackend.launchTask	2015-01-05 12:05:09 -08:00
WangTao	ce39b34404	[SPARK-5057] Log message in failed askWithReply attempts https://issues.apache.org/jira/browse/SPARK-5057 Author: WangTao <barneystinson@aliyun.com> Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3875 from WangTaoTheTonic/SPARK-5057 and squashes the following commits: 1503487 [WangTao] use string interpolation 706c8a7 [WangTaoTheTonic] log more messages	2015-01-05 12:00:02 -08:00
Varun Saxena	d3f07fd23c	[SPARK-4688] Have a single shared network timeout in Spark [SPARK-4688] Have a single shared network timeout in Spark Author: Varun Saxena <vsaxena.varun@gmail.com> Author: varunsaxena <vsaxena.varun@gmail.com> Closes #3562 from varunsaxena/SPARK-4688 and squashes the following commits: 6e97f72 [Varun Saxena] [SPARK-4688] Single shared network timeout cd783a2 [Varun Saxena] SPARK-4688 d6f8c29 [Varun Saxena] SCALA-4688 9562b15 [Varun Saxena] SPARK-4688 a75f014 [varunsaxena] SPARK-4688 594226c [varunsaxena] SPARK-4688	2015-01-05 10:32:37 -08:00
zsxwing	5c506cecb9	[SPARK-5074][Core] Fix a non-deterministic test failure Add `assert(sc.listenerBus.waitUntilEmpty(WAIT_TIMEOUT_MILLIS))` to make sure `sparkListener` receive the message. Author: zsxwing <zsxwing@gmail.com> Closes #3889 from zsxwing/SPARK-5074 and squashes the following commits: e61c198 [zsxwing] Fix a non-deterministic test failure	2015-01-04 21:18:33 -08:00
zsxwing	27e7f5a723	[SPARK-5083][Core] Fix a flaky test in TaskResultGetterSuite Because `sparkEnv.blockManager.master.removeBlock` is asynchronous, we need to make sure the block has already been removed before calling `super.enqueueSuccessfulTask`. Author: zsxwing <zsxwing@gmail.com> Closes #3894 from zsxwing/SPARK-5083 and squashes the following commits: d97c03d [zsxwing] Fix a flaky test in TaskResultGetterSuite	2015-01-04 21:09:21 -08:00
zsxwing	6c726a3fbd	[SPARK-5069][Core] Fix the race condition of TaskSchedulerImpl.dagScheduler It's not necessary to set `TaskSchedulerImpl.dagScheduler` in preStart. It's safe to set it after `initializeEventProcessActor()`. Author: zsxwing <zsxwing@gmail.com> Closes #3887 from zsxwing/SPARK-5069 and squashes the following commits: d95894f [zsxwing] Fix the race condition of TaskSchedulerImpl.dagScheduler	2015-01-04 21:06:04 -08:00
zsxwing	72396522bc	[SPARK-5067][Core] Use '===' to compare well-defined case class A simple fix would be adding `assert(e1.appId == e2.appId)` for `SparkListenerApplicationStart`. But actually we can use `===` for well-defined case class directly. Therefore, instead of fixing this issue, I use `===` to compare those well-defined case classes (all fields have implemented a correct `equals` method, such as primitive types) Author: zsxwing <zsxwing@gmail.com> Closes #3886 from zsxwing/SPARK-5067 and squashes the following commits: 0a51711 [zsxwing] Use '===' to compare well-defined case class	2015-01-04 21:03:17 -08:00
Josh Rosen	939ba1f8f6	[SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs This patch disables output spec. validation for jobs launched through Spark Streaming, since this interferes with checkpoint recovery. Hadoop OutputFormats have a `checkOutputSpecs` method which performs certain checks prior to writing output, such as checking whether the output directory already exists. SPARK-1100 added checks for FileOutputFormat, SPARK-1677 (#947) added a SparkConf configuration to disable these checks, and SPARK-2309 (#1088) extended these checks to run for all OutputFormats, not just FileOutputFormat. In Spark Streaming, we might have to re-process a batch during checkpoint recovery, so `save` actions may be called multiple times. In addition to `DStream`'s own save actions, users might use `transform` or `foreachRDD` and call the `RDD` and `PairRDD` save actions. When output spec. validation is enabled, the second calls to these actions will fail due to existing output. This patch automatically disables output spec. validation for jobs submitted by the Spark Streaming scheduler. This is done by using Scala's `DynamicVariable` to propagate the bypass setting without having to mutate SparkConf or introduce a global variable. Author: Josh Rosen <joshrosen@databricks.com> Closes #3832 from JoshRosen/SPARK-4835 and squashes the following commits: 36eaf35 [Josh Rosen] Add comment explaining use of transform() in test. 6485cf8 [Josh Rosen] Add test case in Streaming; fix bug for transform() 7b3e06a [Josh Rosen] Remove Streaming-specific setting to undo this change; update conf. guide bf9094d [Josh Rosen] Revise disableOutputSpecValidation() comment to not refer to Spark Streaming. e581d17 [Josh Rosen] Deduplicate isOutputSpecValidationEnabled logic. 762e473 [Josh Rosen] [SPARK-4835] Disable validateOutputSpecs for Spark Streaming jobs.	2015-01-04 20:26:18 -08:00
Dale	3fddc9468f	[SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs Author: Dale <tigerquoll@outlook.com> Closes #3809 from tigerquoll/SPARK-4787 and squashes the following commits: 5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block. 2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization.	2015-01-04 13:29:13 -08:00
Brennon York	b96008d552	[SPARK-794][Core] Remove sleep() in ClusterScheduler.stop Removed `sleep()` from the `stop()` method of the `TaskSchedulerImpl` class which, from the JIRA ticket, is believed to be a legacy artifact slowing down testing originally introduced in the `ClusterScheduler` class. Author: Brennon York <brennon.york@capitalone.com> Closes #3851 from brennonyork/SPARK-794 and squashes the following commits: 04c3e64 [Brennon York] Removed sleep() from the stop() method	2015-01-04 12:40:39 -08:00
Josh Rosen	012839807c	[HOTFIX] Bind web UI to ephemeral port in DriverSuite The job launched by DriverSuite should bind the web UI to an ephemeral port, since it looks like port contention in this test has caused a large number of Jenkins failures when many builds are started simultaneously. Our tests already disable the web UI, but this doesn't affect subprocesses launched by our tests. In this case, I've opted to bind to an ephemeral port instead of disabling the UI because disabling features in this test may mask its ability to catch certain bugs. See also: `e24d3a9` Author: Josh Rosen <joshrosen@databricks.com> Closes #3873 from JoshRosen/driversuite-webui-port and squashes the following commits: 48cd05c [Josh Rosen] [HOTFIX] Bind web UI to ephemeral port in DriverSuite.	2015-01-01 15:03:54 -08:00
Reynold Xin	7749dd6c36	[SPARK-5038] Add explicit return type for implicit functions. As we learned in #3580, not explicitly typing implicit functions can lead to compiler bugs and potentially unexpected runtime behavior. This is a follow up PR for rest of Spark (outside Spark SQL). The original PR for Spark SQL can be found at https://github.com/apache/spark/pull/3859 Author: Reynold Xin <rxin@databricks.com> Closes #3860 from rxin/implicit and squashes the following commits: 73702f9 [Reynold Xin] [SPARK-5038] Add explicit return type for implicit functions.	2014-12-31 17:07:47 -08:00
Josh Rosen	e24d3a9a29	[HOTFIX] Disable Spark UI in SparkSubmitSuite tests This should fix a major cause of build breaks when running many parallel tests.	2014-12-31 14:13:09 -08:00
Brennon York	8e14c5eb55	[SPARK-4298][Core] - The spark-submit cannot read Main-Class from Manifest. Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object. Author: Brennon York <brennon.york@capitalone.com> Closes #3561 from brennonyork/SPARK-4298 and squashes the following commits: 5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment 14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class 8d20936 [Brennon York] updated to reset the error message back to the default a043039 [Brennon York] updated to split the uri and jar vals 8da7cbf [Brennon York] fixes SPARK-4298	2014-12-31 11:54:10 -08:00
Josh Rosen	352ed6bbe3	[SPARK-1010] Clean up uses of System.setProperty in unit tests Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures. This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself). For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure. See the block comment at the top of the ResetSystemProperties class for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits: 0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools 3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext 4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties 4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering. 0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite. 7a3d224 [Josh Rosen] Fix trait ordering 3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite 655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite 3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite 8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait. 633a84a [Josh Rosen] Remove use of system properties in FileServerSuite 25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite 1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite 5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite 0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite 51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite 60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite 14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite 628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite 9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite. 4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.	2014-12-30 18:12:20 -08:00
Josh Rosen	efa80a531e	[SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works with KryoSerializer This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used. The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer. Author: Josh Rosen <joshrosen@databricks.com> Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits: 0466c7a [Josh Rosen] Register PythonBroadcast with Kryo. d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug. 069d8a7 [Josh Rosen] Add failing test for SPARK-4882	2014-12-30 09:29:52 -08:00
Zhang, Liye	9077e721cd	[SPARK-4920][UI] add version on master and worker page for standalone mode Author: Zhang, Liye <liye.zhang@intel.com> Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits: 3bb7e0d [Zhang, Liye] add version on master and worker page	2014-12-30 09:19:47 -08:00
Yash Datta	9bc0df6804	SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) Author: Yash Datta <Yash.Datta@guavus.com> Closes #3830 from saucam/fix_takeorder and squashes the following commits: 5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions	2014-12-29 13:49:45 -08:00
Kousuke Saruta	8d72341ab7	[Minor] Fix a typo of type parameter in JavaUtils.scala In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure. This issue is really minor so I don't file in JIRA. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits: e20193d [Kousuke Saruta] Fixed a typo of type parameter 82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-typo-in-javautils 99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala	2014-12-29 12:05:08 -08:00
YanTangZhai	815de54002	[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits: 9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master 718afeb [YanTangZhai] Merge pull request #12 from apache/master 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master	2014-12-29 11:30:54 -08:00
GuoQiang Li	080ceb771a	[SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails Author: GuoQiang Li <witgo@qq.com> Closes #3788 from witgo/SPARK-4952 and squashes the following commits: d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails	2014-12-26 23:31:29 -08:00
Zhang, Liye	786808abfd	[SPARK-4954][Core] add spark version infomation in log for standalone mode The master and worker spark version may be not the same with Driver spark version. That is because spark Jar file might be replaced for new application without restarting the spark cluster. So there shall log out the spark-version in both Mater and Worker log. Author: Zhang, Liye <liye.zhang@intel.com> Closes #3790 from liyezhang556520/version4Standalone and squashes the following commits: e05e1e3 [Zhang, Liye] add spark version infomation in log for standalone mode	2014-12-26 23:24:22 -08:00
Sean Owen	29fabb1b52	SPARK-4297 [BUILD] Build warning fixes omnibus There are a number of warnings generated in a normal, successful build right now. They're mostly Java unchecked cast warnings, which can be suppressed. But there's a grab bag of other Scala language warnings and so on that can all be easily fixed. The forthcoming PR fixes about 90% of the build warnings I see now. Author: Sean Owen <sowen@cloudera.com> Closes #3157 from srowen/SPARK-4297 and squashes the following commits: 8c9e469 [Sean Owen] Suppress unchecked cast warnings, and several other build warning fixes	2014-12-24 13:32:51 -08:00
Kousuke Saruta	199e59aacd	[SPARK-4881][Minor] Use SparkConf#getBoolean instead of get().toBoolean It's really a minor issue. In ApplicationMaster, there is code like as follows. val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean I think, the code can be simplified like as follows. val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3733 from sarutak/SPARK-4881 and squashes the following commits: 1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...) c63daa0 [Kousuke Saruta] Simplified code	2014-12-23 19:14:34 -08:00
Marcelo Vanzin	7e2deb71c4	[SPARK-4606] Send EOF to child JVM when there's no more data to read. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3460 from vanzin/SPARK-4606 and squashes the following commits: 031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read.	2014-12-23 16:07:59 -08:00
Liang-Chi Hsieh	96281cd0c3	[SPARK-4913] Fix incorrect event log path SPARK-2261 uses a single file to log events for an app. `eventLogDir` in `ApplicationDescription` is replaced with `eventLogFile`. However, `ApplicationDescription` in `SparkDeploySchedulerBackend` is initialized with `SparkContext`'s `eventLogDir`. It is just the log directory, not the actual log file path. `Master.rebuildSparkUI` can not correctly rebuild a new SparkUI for the app. Because the `ApplicationDescription` is remotely registered with `Master` and the app's id is then generated in `Master`, we can not get the app id in advance before registration. So the received description needs to be modified with correct `eventLogFile` value. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3755 from viirya/fix_app_logdir and squashes the following commits: 5e0ea35 [Liang-Chi Hsieh] Revision for comment. b5730a1 [Liang-Chi Hsieh] Fix incorrect event log path. Closes #3777 (a duplicate PR for the same JIRA)	2014-12-23 14:58:44 -08:00
Andrew Or	27c5399f4d	[SPARK-4730][YARN] Warn against deprecated YARN settings See https://issues.apache.org/jira/browse/SPARK-4730. Author: Andrew Or <andrew@databricks.com> Closes #3590 from andrewor14/yarn-settings and squashes the following commits: 36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings dcd1316 [Andrew Or] Warn against deprecated YARN settings	2014-12-23 14:28:36 -08:00
Marcelo Vanzin	dd155369a0	[SPARK-4834] [standalone] Clean up application files after app finishes. Commit `7aacb7bfa` added support for sharing downloaded files among multiple executors of the same app. That works great in Yarn, since the app's directory is cleaned up after the app is done. But Spark standalone mode didn't do that, so the lock/cache files created by that change were left around and could eventually fill up the disk hosting /tmp. To solve that, create app-specific directories under the local dirs when launching executors. Multiple executors launched by the same Worker will use the same app directories, so they should be able to share the downloaded files. When the application finishes, a new message is sent to all workers telling them the application has finished; once that message has been received, and all executors registered for the application shut down, then those directories will be cleaned up by the Worker. Note: Unit testing this is hard (if even possible), since local-cluster mode doesn't seem to leave the Master/Worker daemons running long enough after `sc.stop()` is called for the clean up protocol to take effect. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3705 from vanzin/SPARK-4834 and squashes the following commits: b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization. 50eb4b9 [Marcelo Vanzin] Review feedback. c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes.	2014-12-23 12:02:08 -08:00
zsxwing	c233ab3d8d	[SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as, ```Scala val iterable = Seq(1, 2, 3).map(v => { println(v) v }) println("Iterable map done") val iterator = Seq(1, 2, 3).iterator.map(v => { println(v) v }) println("Iterator map done") ``` outputed ``` 1 2 3 Iterable map done Iterator map done ``` So we should use 'iterator' to reduce memory consumed by join. Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E Author: zsxwing <zsxwing@gmail.com> Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits: 48ee7b9 [zsxwing] Remove the explicit types 95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join	2014-12-22 14:26:28 -08:00
genmao.ygm	de9d7d2b5b	[SPARK-4920][UI]:current spark version in UI is not striking. It is not convenient to see the Spark version. We can keep the same style with Spark website. ![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg) Author: genmao.ygm <genmao.ygm@alibaba-inc.com> Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits: 0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking.	2014-12-22 14:14:39 -08:00
Kostas Sakellis	7c0ed13d29	[SPARK-4079] [CORE] Consolidates Errors if a CompressionCodec is not available This commit consolidates some of the exceptions thrown if compression codecs are not available. If a bad configuration string was passed in, a ClassNotFoundException was through. Also, if Snappy was not available, it would throw an InvocationTargetException when the codec was being used (not when it was being initialized). Now, an IllegalArgumentException is thrown when a codec is not available at creation time - either because the class does not exist or the codec itself is not available in the system. This will allow us to have a better message and fail faster. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3119 from ksakellis/kostas-spark-4079 and squashes the following commits: 9709c7c [Kostas Sakellis] Removed unnecessary Logging class 63bfdd0 [Kostas Sakellis] Removed isAvailable to preserve binary compatibility 1d0ef2f [Kostas Sakellis] [SPARK-4079] [CORE] Added more information to exception 64f3d27 [Kostas Sakellis] [SPARK-4079] [CORE] Code review feedback 52dfa8f [Kostas Sakellis] [SPARK-4079] [CORE] Default to LZF if Snappy not available	2014-12-22 13:07:01 -08:00
Takeshi Yamamuro	fb8e85e80e	[SPARK-4733] Add missing prameter comments in ShuffleDependency Add missing Javadoc comments in ShuffleDependency. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3594 from maropu/DependencyJavadocFix and squashes the following commits: 32129b4 [Takeshi Yamamuro] Fix comments in @aggregator and @mapSideCombine 303c75d [Takeshi Yamamuro] [SPARK-4733] Add missing prameter comments in ShuffleDependency	2014-12-22 12:19:23 -08:00
Zhang, Liye	39272c8cdb	[SPARK-4870] Add spark version to driver log Author: Zhang, Liye <liye.zhang@intel.com> Closes #3717 from liyezhang556520/version2Log and squashes the following commits: ccd30d7 [Zhang, Liye] delete log in sparkConf 330f70c [Zhang, Liye] move the log from SaprkConf to SparkContext 96dc115 [Zhang, Liye] remove curly brace e833330 [Zhang, Liye] add spark version to driver log	2014-12-22 11:38:28 -08:00
zsxwing	93b2f3a882	[SPARK-4918][Core] Reuse Text in saveAsTextFile Reuse Text in saveAsTextFile to reduce GC. /cc rxin Author: zsxwing <zsxwing@gmail.com> Closes #3762 from zsxwing/SPARK-4918 and squashes the following commits: 59f03eb [zsxwing] Reuse Text in saveAsTextFile	2014-12-22 11:20:00 -08:00
zsxwing	6ee6aa70b7	[SPARK-2075][Core] Make the compiler generate same bytes code for Hadoop 1.+ and Hadoop 2.+ `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes. I used the following commands to confirm the generated byte codes are some. ``` mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt diff ~/hadoop1.txt ~/hadoop2.txt ``` However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`. To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`. Author: zsxwing <zsxwing@gmail.com> Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits: 39d9df2 [zsxwing] Fix the code style e4ad8b5 [zsxwing] Use null for the implicit Ordering 734bac9 [zsxwing] Explicitly set the implicit parameters ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD	2014-12-21 22:10:19 -08:00
Sean Owen	c6a3c0d505	SPARK-4910 [CORE] build failed (use of FileStatus.isFile in Hadoop 1.x) Fix small Hadoop 1 compile error from SPARK-2261. In Hadoop 1.x, all we have is FileStatus.isDir, so these "is file" assertions are changed to "is not a dir". This is how similar checks are done so far in the code base. Author: Sean Owen <sowen@cloudera.com> Closes #3754 from srowen/SPARK-4910 and squashes the following commits: 52c5e4e [Sean Owen] Fix small Hadoop 1 compile error from SPARK-2261	2014-12-21 13:16:57 -08:00
huangzhaowei	a764960b3b	[Minor] Build Failed: value defaultProperties not found Mvn Build Failed: value defaultProperties not found .Maybe related to this pr: `1d648123a7` andrewor14 can you look at this problem? Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits: 8e2917c [huangzhaowei] Build Failed: value defaultProperties not found	2014-12-19 23:32:56 -08:00
Kanwaljit Singh	1d648123a7	SPARK-2641: Passing num executors to spark arguments from properties file Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances. Author: Kanwaljit Singh <kanwaljit.singh@guavus.com> Closes #1657 from kjsingh/branch-1.0 and squashes the following commits: d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala	2014-12-19 19:27:23 -08:00
Marcelo Vanzin	456451911d	[SPARK-2261] Make event logger use a single file. Currently the event logger uses a directory and several files to describe an app's event log, all but one of which are empty. This is not very HDFS-friendly, since creating lots of nodes in HDFS (especially when they don't contain any data) is frowned upon due to the node metadata being kept in the NameNode's memory. Instead, add a header section to the event log file that contains metadata needed to read the events. This metadata includes things like the Spark version (for future code that may need it for backwards compatibility) and the compression codec used for the event data. With the new approach, aside from reducing the load on the NN, there's also a lot less remote calls needed when reading the log directory. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1222 from vanzin/hist-server-single-log and squashes the following commits: cc8f5de [Marcelo Vanzin] Store header in plain text. c7e6123 [Marcelo Vanzin] Update comment. 59c561c [Marcelo Vanzin] Review feedback. 216c5a3 [Marcelo Vanzin] Review comments. dce28e9 [Marcelo Vanzin] Fix log overwrite test. f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test. 346f0b4 [Marcelo Vanzin] Review feedback. ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 3f4500f [Marcelo Vanzin] Unit test for SPARK-3697. 45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch. b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 16fd491 [Marcelo Vanzin] Use unique log directory for each codec. 0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log d93c44a [Marcelo Vanzin] Add a newline to make the header more readable. 9e928ba [Marcelo Vanzin] Add types. bd6ba8c [Marcelo Vanzin] Review feedback. a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning. 16661a3 [Marcelo Vanzin] Simplify some internal code. cc6bce4 [Marcelo Vanzin] Some review feedback. a722184 [Marcelo Vanzin] Do not encode metadata in log file name. 3700586 [Marcelo Vanzin] Restore log flushing. f677930 [Marcelo Vanzin] Fix botched rebase. ae571fa [Marcelo Vanzin] Fix end-to-end event logger test. 9db0efd [Marcelo Vanzin] Show prettier name in UI. 8f42274 [Marcelo Vanzin] Make history server parse old-style log directories. 6251dd7 [Marcelo Vanzin] Make event logger use a single file.	2014-12-19 18:23:42 -08:00
Ryan Williams	7981f96976	[SPARK-4896] don’t redundantly overwrite executor JAR deps Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #2848 from ryan-williams/fetch-file and squashes the following commits: c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently 8e39c16 [Ryan Williams] code review feedback 788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps	2014-12-19 15:24:41 -08:00
Ryan Williams	cdb2c645ab	[SPARK-4889] update history server example cmds Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3736 from ryan-williams/hist and squashes the following commits: 421d8ff [Ryan Williams] add another random typo fix 76d6a4c [Ryan Williams] remove hdfs example a2d0f82 [Ryan Williams] code review feedback 9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds	2014-12-19 13:56:04 -08:00
Reynold Xin	336cd341ee	Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor. This consolidates some code path and makes constructor arguments simpler for a few classes. Author: Reynold Xin <rxin@databricks.com> Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits: 82e02cc [Reynold Xin] Fixed couple bugs. 217062a [Reynold Xin] Code review feedback. bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.	2014-12-19 12:51:12 -08:00
Sandy Ryza	283263ffaa	SPARK-3428. TaskMetrics for running tasks is missing GC time metrics Author: Sandy Ryza <sandy@cloudera.com> Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits: cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics	2014-12-18 22:40:44 -08:00
Liang-Chi Hsieh	d7fc69a8b5	[SPARK-4674] Refactor getCallSite The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3532 from viirya/refactor_getCallSite and squashes the following commits: 62aa124 [Liang-Chi Hsieh] Fix style. e741017 [Liang-Chi Hsieh] Refactor getCallSite.	2014-12-18 21:41:02 -08:00
Andrew Or	9804a759b6	[SPARK-4754] Refactor SparkContext into ExecutorAllocationClient This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object. cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`. Author: Andrew Or <andrew@databricks.com> Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits: 187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient	2014-12-18 17:38:33 -08:00
Aaron Davidson	105293a7d0	[SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config This is used in NioBlockTransferService here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66 Author: Aaron Davidson <aaron@databricks.com> Closes #3688 from aarondav/SPARK-4837 and squashes the following commits: ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config	2014-12-18 16:43:16 -08:00
Ivan Vergiliev	f9f58b9a01	SPARK-4743 - Use SparkEnv.serializer instead of closureSerializer in aggregateByKey and foldByKey Author: Ivan Vergiliev <ivan@leanplum.com> Closes #3605 from IvanVergiliev/change-serializer and squashes the following commits: a49b7cf [Ivan Vergiliev] Use serializer instead of closureSerializer in aggregate/foldByKey.	2014-12-18 16:29:36 -08:00
Madhu Siddalingaiah	d5a596d418	[SPARK-4884]: Improve Partition docs Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884 Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3722 from msiddalingaiah/master and squashes the following commits: 79e679f [Madhu Siddalingaiah] [DOC]: improve documentation 51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions	2014-12-18 16:00:53 -08:00
Ilya Ganelin	3720057b8e	[SPARK-3607] ConnectionManager threads.max configs on the thread pools don't work Hi all - cleaned up the code to get rid of the unused parameter and added some discussion of the ThreadPoolExecutor parameters to explain why we can use a single threadCount instead of providing a min/max. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3664 from ilganeli/SPARK-3607C and squashes the following commits: 3c05690 [Ilya Ganelin] Updated documentation and refactored code to extract shared variables	2014-12-18 12:53:18 -08:00
Saisai Shao	cf50631a66	[SPARK-4595][Core] Fix MetricsServlet not work issue `MetricsServlet` handler should be added to the web UI after initialized by `MetricsSystem`, otherwise servlet handler cannot be attached. Author: Saisai Shao <saisai.shao@intel.com> Author: Josh Rosen <joshrosen@databricks.com> Author: jerryshao <saisai.shao@intel.com> Closes #3444 from jerryshao/SPARK-4595 and squashes the following commits: 434d17e [Saisai Shao] Merge pull request #10 from JoshRosen/metrics-system-cleanup 87a2292 [Josh Rosen] Guard against misuse of MetricsSystem methods. f779fe0 [jerryshao] Fix MetricsServlet not work issue	2014-12-17 11:47:44 -08:00
Davies Liu	ed362008f0	[SPARK-4437] update doc for WholeCombineFileRecordReader update doc for WholeCombineFileRecordReader Author: Davies Liu <davies@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #3301 from davies/fix_doc and squashes the following commits: 1d7422f [Davies Liu] Merge pull request #2 from JoshRosen/whole-text-file-cleanup dc3d21a [Josh Rosen] More genericization in ConfigurableCombineFileRecordReader. 95d13eb [Davies Liu] address comment bf800b9 [Davies Liu] update doc for WholeCombineFileRecordReader	2014-12-16 11:19:36 -08:00
meiyoula	c7628771da	[SPARK-4792] Add error message when making local dir unsuccessfully Author: meiyoula <1039320815@qq.com> Closes #3635 from XuTingjun/master and squashes the following commits: dd1c66d [meiyoula] when old is deleted, it will throw an exception where call it 2a55bc2 [meiyoula] Update DiskBlockManager.scala `1483a4a` [meiyoula] Delete multiple retries to make dir `67f7902` [meiyoula] Try some times to make dir maybe more reasonable 1c51a0c [meiyoula] Update DiskBlockManager.scala	2014-12-15 22:30:18 -08:00
wangfei	5c24759ddc	[Minor][Core] fix comments in MapOutputTracker Using driver and executor in the comments of ```MapOutputTracker``` is more clear. Author: wangfei <wangfei1@huawei.com> Closes #3700 from scwf/commentFix and squashes the following commits: aa68524 [wangfei] master and worker should be driver and executor	2014-12-15 16:46:43 -08:00
Sean Owen	2a28bc6100	SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it. Author: Sean Owen <sowen@cloudera.com> Closes #3690 from srowen/SPARK-785 and squashes the following commits: 8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey	2014-12-15 16:06:15 -08:00
Ryan Williams	8176b7a02e	[SPARK-4668] Fix some documentation typos. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3523 from ryan-williams/tweaks and squashes the following commits: d2eddaa [Ryan Williams] code review feedback ce27fc1 [Ryan Williams] CoGroupedRDD comment nit c6cfad9 [Ryan Williams] remove unnecessary if statement b74ea35 [Ryan Williams] comment fix b0221f0 [Ryan Williams] fix a gendered pronoun c71ffed [Ryan Williams] use names on a few boolean parameters 89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io 83e8358 [Ryan Williams] fix pom.xml typo dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md	2014-12-15 14:52:17 -08:00
Ilya Ganelin	38703bbca8	[SPARK-1037] The name of findTaskFromList & findTask in TaskSetManager.scala is confusing Hi all - I've renamed the methods referenced in this JIRA to clarify that they modify the provided arrays (find vs. deque). Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3665 from ilganeli/SPARK-1037B and squashes the following commits: 64c177c [Ilya Ganelin] Renamed deque to dequeue f27d85e [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters 683482a [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters	2014-12-15 14:51:15 -08:00
Zhang, Liye	57d37f9c71	[CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala with other places Author: Zhang, Liye <liye.zhang@intel.com> Closes #2793 from liyezhang556520/uniformHashMap and squashes the following commits: 5884735 [Zhang, Liye] [CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala	2014-12-10 20:44:59 -08:00
Andrew Or	4f93d0cabe	[SPARK-4759] Fix driver hanging from coalescing partitions The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction. This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly. Author: Andrew Or <andrew@databricks.com> Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits: e520d6b [Andrew Or] Oops 3ebf8bd [Andrew Or] A few comments f370a4e [Andrew Or] Fix tests 2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location	2014-12-10 14:27:53 -08:00
Ilya Ganelin	447ae2de5d	[SPARK-4569] Rename 'externalSorting' in Aggregator Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits: 1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator 5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator	2014-12-10 14:19:37 -08:00
Andrew Or	faa8fd8178	[SPARK-4215] Allow requesting / killing executors only in YARN mode Currently this doesn't do anything in other modes, so we might as well just disable it rather than having the user mistakenly rely on it. Author: Andrew Or <andrew@databricks.com> Closes #3615 from andrewor14/dynamic-allocation-yarn-only and squashes the following commits: ce6487a [Andrew Or] Allow requesting / killing executors only in YARN mode	2014-12-10 12:48:24 -08:00
Kousuke Saruta	0fc637b4c2	[SPARK-4329][WebUI] HistoryPage pagenation Current HistoryPage have links only to previous page or next page. I suggest to add index to access history pages easily. I implemented like following pics. If there are many pages, current page +/- N pages, head page and last page are indexed. ![2014-11-10 16 13 25](https://cloud.githubusercontent.com/assets/4736016/4986246/9c7bbac4-6937-11e4-8695-8634d039d5b6.png) ![2014-11-10 16 03 21](https://cloud.githubusercontent.com/assets/4736016/4986210/3951bb74-6937-11e4-8b4e-9f90d266d736.png) ![2014-11-10 16 03 39](https://cloud.githubusercontent.com/assets/4736016/4986211/3b196ad8-6937-11e4-9f81-74bc0a6dad5b.png) ![2014-11-10 16 03 49](https://cloud.githubusercontent.com/assets/4736016/4986213/40686138-6937-11e4-86c0-41100f0404f6.png) ![2014-11-10 16 04 04](https://cloud.githubusercontent.com/assets/4736016/4986215/4326c9b4-6937-11e4-87ac-0f30c86ec6e3.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3194 from sarutak/history-page-indexing and squashes the following commits: 15d3d2d [Kousuke Saruta] Simplified code c93932e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 1c2f605 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 76b05e3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing b2240f8 [Kousuke Saruta] Fixed style ec7922e [Kousuke Saruta] Simplified code 755a004 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing cfa242b [Kousuke Saruta] Added index to HistoryPage	2014-12-10 12:30:45 -08:00
Nathan Kronenfeld	94b377f944	[SPARK-4772] Clear local copies of accumulators as soon as we're done with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them	2014-12-09 23:53:17 -08:00
Josh Rosen	f79c1cfc99	[Minor] Use <sup> tag for help icon in web UI page header This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing. Before: ![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png) After: ![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png) Author: Josh Rosen <joshrosen@databricks.com> Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits: bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.	2014-12-09 23:47:05 -08:00
Sandy Ryza	5e4c06f8e5	SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable Author: Sandy Ryza <sandy@cloudera.com> Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits: cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable	2014-12-09 16:26:07 -08:00
hushan[胡珊]	30dca924df	[SPARK-4714] BlockManager.dropFromMemory() should check whether block has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn	2014-12-09 15:54:40 -08:00
Kay Ousterhout	1f5110630c	[SPARK-4765] Make GC time always shown in UI. This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3622 from kayousterhout/gc_time and squashes the following commits: 15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark] e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI.	2014-12-09 15:10:36 -08:00
maji2014	b31074466a	[SPARK-4691][shuffle] Restructure a few lines in shuffle code In HashShuffleReader.scala and HashShuffleWriter.scala, no need to judge "dep.aggregator.isEmpty" again as this is judged by "dep.aggregator.isDefined" In SortShuffleWriter.scala, "dep.aggregator.isEmpty" is better than "!dep.aggregator.isDefined" ? Author: maji2014 <maji3@asiainfo.com> Closes #3553 from maji2014/spark-4691 and squashes the following commits: bf7b14d [maji2014] change a elegant way for SortShuffleWriter.scala 10d0cf0 [maji2014] change a elegant way d8f52dc [maji2014] code optimization for judgement	2014-12-09 13:13:12 -08:00
Sean Owen	e829bfa1ab	SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not serializable My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter. Author: Sean Owen <sowen@cloudera.com> Closes #3587 from srowen/SPARK-3926 and squashes the following commits: 8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE 7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test	2014-12-08 16:13:03 -08:00
Andrew Or	65f929d5b3	[SPARK-4750] Dynamic allocation - synchronize kills Simple omission on my part. Author: Andrew Or <andrew@databricks.com> Closes #3612 from andrewor14/dynamic-allocation-synchronization and squashes the following commits: 1f03b60 [Andrew Or] Synchronize kills	2014-12-08 16:02:33 -08:00
Christophe Préaud	ab2abcb5ef	[SPARK-4764] Ensure that files are fetched atomically tempFile is created in the same directory than targetFile, so that the move from tempFile to targetFile is always atomic Author: Christophe Préaud <christophe.preaud@kelkoo.com> Closes #2855 from preaudc/master and squashes the following commits: 9ba89ca [Christophe Préaud] Ensure that files are fetched atomically 54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master' c6a5590 [Christophe Préaud] Revert commit 8ea871f8130b2490f1bad7374a819bf56f0ccbbd 7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master' 8ea871f [Christophe Préaud] Ensure that files are fetched atomically	2014-12-08 11:44:54 -08:00
Zhang, Liye	98a7d09978	[SPARK-4005][CORE] handle message replies in receive instead of in the individual private methods In BlockManagermasterActor, when handling message type UpdateBlockInfo, the message replies is in handled in individual private methods, should handle it in receive of Akka. Author: Zhang, Liye <liye.zhang@intel.com> Closes #2853 from liyezhang556520/akkaRecv and squashes the following commits: 9b06f0a [Zhang, Liye] remove the unreachable code bf518cd [Zhang, Liye] change the indent 242166b [Zhang, Liye] modified accroding to the comments d4b929b [Zhang, Liye] [SPARK-4005][CORE] handle message replies in receive instead of in the individual private methods	2014-12-05 12:00:32 -08:00
Reynold Xin	ed92b47e83	[SPARK-4397] Move object RDD to the front of RDD.scala. I ran into multiple cases that SBT/Scala compiler was confused by the implicits in continuous compilation mode. Adding explicit return types fixes the problem. Author: Reynold Xin <rxin@databricks.com> Closes #3580 from rxin/rdd-implicit and squashes the following commits: ee32fcd [Reynold Xin] Move object RDD to the end of the file. b8562c9 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into rdd-implicit d4e9f85 [Reynold Xin] Code review. a836a37 [Reynold Xin] Move object RDD to the front of RDD.scala.	2014-12-04 16:32:20 -08:00
Saldanha	743a889d27	[SPARK-4459] Change groupBy type parameter from K to U Please see https://issues.apache.org/jira/browse/SPARK-4459 Author: Saldanha <saldaal1@phusca-l24858.wlan.na.novartis.net> Closes #3327 from alokito/master and squashes the following commits: 54b1095 [Saldanha] [SPARK-4459] changed type parameter for keyBy from K to U d5f73c3 [Saldanha] [SPARK-4459] added keyBy test 316ad77 [Saldanha] SPARK-4459 changed type parameter for groupBy from K to U. 62ddd4b [Saldanha] SPARK-4459 added failing unit test	2014-12-04 14:57:41 -08:00
WangTaoTheTonic	8106b1e36b	[SPARK-4253] Ignore spark.driver.host in yarn-cluster and standalone-cluster modes In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched. If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches. This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode. Author: WangTaoTheTonic <barneystinson@aliyun.com> Author: WangTao <barneystinson@aliyun.com> Closes #3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits: ed1a25c [WangTaoTheTonic] revert unrelated formatting issue 02c4e49 [WangTao] add comment 32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext 667cf24 [WangTaoTheTonic] document fix ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode 2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode	2014-12-04 11:53:23 -08:00
Reynold Xin	c3ad486036	[SPARK-4719][API] Consolidate various narrow dep RDD classes with MapPartitionsRDD MappedRDD, MappedValuesRDD, FlatMappedValuesRDD, FilteredRDD, GlommedRDD, FlatMappedRDD are not necessary. They can be implemented trivially using MapPartitionsRDD. Author: Reynold Xin <rxin@databricks.com> Closes #3578 from rxin/SPARK-4719 and squashes the following commits: eed9853 [Reynold Xin] Preserve partitioning for filter. eb1a89b [Reynold Xin] [SPARK-4719][API] Consolidate various narrow dep RDD classes with MapPartitionsRDD.	2014-12-04 00:45:57 -08:00
Reynold Xin	1826372d0a	[SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file. cc aarondav kayousterhout pwendell This should go into 1.2? Author: Reynold Xin <rxin@databricks.com> Closes #3579 from rxin/SPARK-4085 and squashes the following commits: 255b4fd [Reynold Xin] Updated test. f9814d9 [Reynold Xin] Code review feedback. 2afaf35 [Reynold Xin] [SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file.	2014-12-03 16:28:24 -08:00
Mark Hamstra	96b27855c5	[SPARK-4498][core] Don't transition ExecutorInfo to RUNNING until Driver adds Executor The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master. Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting. JoshRosen Author: Mark Hamstra <markhamstra@gmail.com> Closes #3550 from markhamstra/SPARK-4498 and squashes the following commits: 8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver	2014-12-03 15:08:01 -08:00
zsxwing	edd3cd477c	[SPARK-4715][Core] Make sure tryToAcquire won't return a negative value ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`. Author: zsxwing <zsxwing@gmail.com> Closes #3575 from zsxwing/SPARK-4715 and squashes the following commits: a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value	2014-12-03 12:19:40 -08:00
zsxwing	8af551f71d	[SPARK-4397][Core] Change the 'since' value of '@deprecated' to '1.3.0' As #3262 wasn't merged to branch 1.2, the `since` value of `deprecated` should be '1.3.0'. Author: zsxwing <zsxwing@gmail.com> Closes #3573 from zsxwing/SPARK-4397-version and squashes the following commits: 1daa03c [zsxwing] Change the 'since' value to '1.3.0'	2014-12-03 02:05:17 -08:00
JerryLead	77be8b986f	[SPARK-4672][Core]Checkpoint() should clear f to shorten the serialization chain The related JIRA is https://issues.apache.org/jira/browse/SPARK-4672 The f closure of `PartitionsRDD(ZippedPartitionsRDD2)` contains a `$outer` that references EdgeRDD/VertexRDD, which causes task's serialization chain become very long in iterative GraphX applications. As a result, StackOverflow error will occur. If we set "f = null" in `clearDependencies()`, checkpoint() can cut off the long serialization chain. More details and explanation can be found in the JIRA. Author: JerryLead <JerryLead@163.com> Author: Lijie Xu <csxulijie@gmail.com> Closes #3545 from JerryLead/my_core and squashes the following commits: f7faea5 [JerryLead] checkpoint() should clear the f to avoid StackOverflow error c0169da [JerryLead] Merge branch 'master' of https://github.com/apache/spark 52799e3 [Lijie Xu] Merge pull request #1 from apache/master	2014-12-02 23:53:29 -08:00
zsxwing	6dfe38a03a	[SPARK-4397][Core] Cleanup 'import SparkContext._' in core This PR cleans up `import SparkContext._` in core for SPARK-4397(#3262) to prove it really works well. Author: zsxwing <zsxwing@gmail.com> Closes #3530 from zsxwing/SPARK-4397-cleanup and squashes the following commits: 04e2273 [zsxwing] Cleanup 'import SparkContext._' in core	2014-12-02 00:18:41 -08:00
zsxwing	30a86acdef	[SPARK-4661][Core] Minor code and docs cleanup Author: zsxwing <zsxwing@gmail.com> Closes #3521 from zsxwing/SPARK-4661 and squashes the following commits: 03cbe3f [zsxwing] Minor code and docs cleanup	2014-12-01 00:35:01 -08:00
zsxwing	1d238f221c	[SPARK-4664][Core] Throw an exception when spark.akka.frameSize > 2047 If `spark.akka.frameSize` > 2047, it will overflow and become negative. Should have some assertion in `maxFrameSizeBytes` to warn people. Author: zsxwing <zsxwing@gmail.com> Closes #3527 from zsxwing/SPARK-4664 and squashes the following commits: 0089c7a [zsxwing] Throw an exception when spark.akka.frameSize > 2047	2014-12-01 00:32:54 -08:00
carlmartin	aea7a99761	[SPARK-4623]Add the some error infomation if using spark-sql in yarn-cluster mode If using spark-sql in yarn-cluster mode, print an error infomation just as the spark shell in yarn-cluster mode. Author: carlmartin <carlmartinmax@gmail.com> Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3479 from SaintBacchus/sparkSqlShell and squashes the following commits: 35829a9 [carlmartin] improve the description of comment e6c1eb7 [carlmartin] add a comment in bin/spark-sql to remind user who wants to change the class f1c5c8d [carlmartin] Merge branch 'master' into sparkSqlShell 8e112c5 [huangzhaowei] singular form ec957bc [carlmartin] Add the some error infomation if using spark-sql in yarn-cluster mode 7bcecc2 [carlmartin] Merge branch 'master' of https://github.com/apache/spark into codereview 4fad75a [carlmartin] Add the Error infomation using spark-sql in yarn-cluster mode	2014-11-30 16:19:41 -08:00
Sean Owen	048ecca625	SPARK-2143 [WEB UI] Add Spark version to UI footer This PR adds the Spark version number to the UI footer; this is how it looks: ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png) Author: Sean Owen <sowen@cloudera.com> Closes #3410 from srowen/SPARK-2143 and squashes the following commits: e9b3a7a [Sean Owen] Add Spark version to footer	2014-11-30 11:41:38 -08:00
zsxwing	c06222427f	[SPARK-4505][Core] Add a ClassTag parameter to CompactBuffer[T] Added a ClassTag parameter to CompactBuffer. So CompactBuffer[T] can create primitive arrays for primitive types. It will reduce the memory usage for primitive types significantly and only pay minor performance lost. Here is my test code: ```Scala // Call org.apache.spark.util.SizeEstimator.estimate def estimateSize(obj: AnyRef): Long = { val c = Class.forName("org.apache.spark.util.SizeEstimator$") val f = c.getField("MODULE$") val o = f.get(c) val m = c.getMethod("estimate", classOf[Object]) m.setAccessible(true) m.invoke(o, obj).asInstanceOf[Long] } sc.parallelize(1 to 10000).groupBy(_ => 1).foreach { case (k, v) => println(v.getClass() + " size: " + estimateSize(v)) } ``` Using the previous CompactBuffer outputed ``` class org.apache.spark.util.collection.CompactBuffer size: 313358 ``` Using the new CompactBuffer outputed ``` class org.apache.spark.util.collection.CompactBuffer size: 65712 ``` In this case, the new `CompactBuffer` only used 20% memory of the previous one. It's really helpful for `groupByKey` when using a primitive value. Author: zsxwing <zsxwing@gmail.com> Closes #3378 from zsxwing/SPARK-4505 and squashes the following commits: 4abdbba [zsxwing] Add a ClassTag parameter to reduce the memory usage of CompactBuffer[T] when T is a primitive type	2014-11-29 20:23:08 -05:00
Stephen Haberman	95290bf4c4	Include the key name when failing on an invalid value. Admittedly a really small tweak. Author: Stephen Haberman <stephen@exigencecorp.com> Closes #3514 from stephenh/include-key-name-in-npe and squashes the following commits: 937740a [Stephen Haberman] Include the key name when failing on an invalid value.	2014-11-29 20:12:05 -05:00
Liang-Chi Hsieh	49fe8797e6	[SPARK-4597] Use proper exception and reset variable in Utils.createTempDir() `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3449 from viirya/fix_createtempdir and squashes the following commits: 36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.	2014-11-28 18:04:15 -08:00
maji2014	ceb6281970	[SPARK-4619][Storage]delete redundant time suffix Time suffix exists in Utils.getUsedTimeMs(startTime), no need to append again, delete that Author: maji2014 <maji3@asiainfo.com> Closes #3475 from maji2014/SPARK-4619 and squashes the following commits: df0da4e [maji2014] delete redundant time suffix	2014-11-28 00:36:22 -08:00
Cheng Lian	120a350240	[SPARK-4613][Core] Java API for JdbcRDD This PR introduces a set of Java APIs for using `JdbcRDD`: 1. Trait (interface) `JdbcRDD.ConnectionFactory`: equivalent to the `getConnection: () => Connection` parameter in `JdbcRDD` constructor. 2. Two overloaded versions of `Jdbc.create`: used to create `JavaRDD` that wraps a `JdbcRDD`. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3478) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3478 from liancheng/japi-jdbc-rdd and squashes the following commits: 9a54625 [Cheng Lian] Only shutdowns a single DB rather than the whole Derby driver d4cedc5 [Cheng Lian] Moves Java JdbcRDD test case to a separate test suite ffcdf2e [Cheng Lian] Java API for JdbcRDD	2014-11-27 18:01:14 -08:00
roxchkplusony	84376d3139	[SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler Author: roxchkplusony <roxchkplusony@gmail.com> Closes #3483 from roxchkplusony/bugfix/4626 and squashes the following commits: aba9184 [roxchkplusony] replace warning message per review 5e7fdea [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler	2014-11-27 15:54:40 -08:00
Sean Owen	5d7fe178b3	SPARK-4170 [CORE] Closure problems when running Scala app that "extends App" Warn against subclassing scala.App, and remove one instance of this in examples Author: Sean Owen <sowen@cloudera.com> Closes #3497 from srowen/SPARK-4170 and squashes the following commits: 4a6131f [Sean Owen] Restore multiline string formatting a8ca895 [Sean Owen] Warn against subclassing scala.App, and remove one instance of this in examples	2014-11-27 09:03:17 -08:00
CodingCat	5af53ada65	[SPARK-732][SPARK-3628][CORE][RESUBMIT] eliminate duplicate update on accmulator https://issues.apache.org/jira/browse/SPARK-3628 In current implementation, the accumulator will be updated for every successfully finished task, even the task is from a resubmitted stage, which makes the accumulator counter-intuitive In this patch, I changed the way for the DAGScheduler to update the accumulator, DAGScheduler maintains a HashTable, mapping the stage id to the received <accumulator_id , value> pairs. Only when the stage becomes independent, (no job needs it any more), we accumulate the values of the <accumulator_id , value> pairs, when a task finished, we check if the HashTable has contained such stageId, it saves the accumulator_id, value only when the task is the first finished task of a new stage or the stage is running for the first attempt... Author: CodingCat <zhunansjtu@gmail.com> Closes #2524 from CodingCat/SPARK-732-1 and squashes the following commits: 701a1e8 [CodingCat] roll back change on Accumulator.scala 1433e6f [CodingCat] make MIMA happy b233737 [CodingCat] address Matei's comments 02261b8 [CodingCat] rollback some changes 6b0aff9 [CodingCat] update document 2b2e8cf [CodingCat] updateAccumulator 83b75f8 [CodingCat] style fix 84570d2 [CodingCat] re-enable the bad accumulator guard 1e9e14d [CodingCat] add NPE guard 21b6840 [CodingCat] simplify the patch 88d1f03 [CodingCat] fix rebase error f74266b [CodingCat] add test case for resubmitted result stage 5cf586f [CodingCat] de-duplicate on task level 138f9b3 [CodingCat] make MIMA happy 67593d2 [CodingCat] make if allowing duplicate update as an option of accumulator	2014-11-26 16:52:04 -08:00
Tathagata Das	e7f4d2534b	[SPARK-4612] Reduce task latency and increase scheduling throughput by making configuration initialization lazy https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L337 creates a configuration object for every task that is launched, even if there is no new dependent file/JAR to update. This is a heavy-weight creation that should be avoided if there is no new file/JAR to update. This PR makes that creation lazy. Quick local test in spark-perf scheduling throughput tests gives the following numbers in a local standalone scheduler mode. 1 job with 10000 tasks: before 7.8395 seconds, after 2.6415 seconds = 3x increase in task scheduling throughput pwendell JoshRosen Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #3463 from tdas/lazy-config and squashes the following commits: c791c1e [Tathagata Das] Reduce task latency by making configuration initialization lazy	2014-11-25 23:15:58 -08:00
Aaron Davidson	f5f2d27385	[SPARK-4516] Cap default number of Netty threads at 8 In practice, only 2-4 cores should be required to transfer roughly 10 Gb/s, and each core that we use will have an initial overhead of roughly 32 MB of off-heap memory, which comes at a premium. Thus, this value should still retain maximum throughput and reduce wasted off-heap memory allocation. It can be overridden by setting the number of serverThreads and clientThreads manually in Spark's configuration. Author: Aaron Davidson <aaron@databricks.com> Closes #3469 from aarondav/fewer-pools2 and squashes the following commits: 087c59f [Aaron Davidson] [SPARK-4516] Cap default number of Netty threads at 8	2014-11-25 23:57:04 -05:00
hushan[胡珊]	9bdf5da590	Fix SPARK-4471: blockManagerIdFromJson function throws exception while B... Fix [SPARK-4471](https://issues.apache.org/jira/browse/SPARK-4471): blockManagerIdFromJson function throws exception while BlockManagerId be null in MetadataFetchFailedException Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3340 from suyanNone/fix-blockmanagerId-jnothing-2 and squashes the following commits: 159f9a3 [hushan[胡珊]] Refine test code for blockmanager is null 4380d73 [hushan[胡珊]] remove useless blank line 3ccf651 [hushan[胡珊]] Fix SPARK-4471: blockManagerIdFromJson function throws exception while metadata fetch failed	2014-11-25 15:51:08 -08:00
Andrew Or	9afcbe494a	[SPARK-4546] Improve HistoryServer first time user experience The documentation points the user to run the following ``` sbin/start-history-server.sh ``` The first thing this does is throw an exception that complains a log directory is not specified. The exception message itself does not say anything about what to set. Instead we should have a default and a landing page with a better message. The new default log directory is `file:/tmp/spark-events`. This is what it looks like as of this PR: ![after](https://issues.apache.org/jira/secure/attachment/12682985/after.png) Author: Andrew Or <andrew@databricks.com> Closes #3411 from andrewor14/minor-history-improvements and squashes the following commits: f33d6b3 [Andrew Or] Point user to set config if default log dir does not exist fc4c17a [Andrew Or] Improve HistoryServer UX	2014-11-25 15:48:02 -08:00
Andrew Or	1b2ab1cd1b	[SPARK-4592] Avoid duplicate worker registrations in standalone mode Summary. On failover, the Master may receive duplicate registrations from the same worker, causing the worker to exit. This is caused by this commit `4afe9a4852`, which adds logic for the worker to re-register with the master in case of failures. However, the following race condition may occur: (1) Master A fails and Worker attempts to reconnect to all masters (2) Master B takes over and notifies Worker (3) Worker responds by registering with Master B (4) Meanwhile, Worker's previous reconnection attempt reaches Master B, causing the same Worker to register with Master B twice Fix. Instead of attempting to register with all known masters, the worker should re-register with only the one that it has been communicating with. This is safe because the fact that a failover has occurred means the old master must have died. Then, when the worker is finally notified of a new master, it gives up on the old one in favor of the new one. Caveat. Even this fix is subject to more obscure race conditions. For instance, if Master B fails and Master A recovers immediately, then Master A may still observe duplicate worker registrations. However, this and other potential race conditions summarized in [SPARK-4592](https://issues.apache.org/jira/browse/SPARK-4592), are much, much less likely than the one described above, which is deterministically reproducible. Author: Andrew Or <andrew@databricks.com> Closes #3447 from andrewor14/standalone-failover and squashes the following commits: 0d9716c [Andrew Or] Move re-registration logic to actor for thread-safety 79286dc [Andrew Or] Preserve old behavior for initial retries 83b321c [Andrew Or] Tweak wording 1fce6a9 [Andrew Or] Active master actor could be null in the beginning b6f269e [Andrew Or] Avoid duplicate worker registrations	2014-11-25 15:46:26 -08:00
Jongyoul Lee	f0afb623dc	[SPARK-4525] Mesos should decline unused offers Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly. I've also done some minor renaming/clean-up of variables in this class and tests. Author: Patrick Wendell <pwendell@gmail.com> Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3436 from pwendell/mesos-issue and squashes the following commits: 58c35b5 [Patrick Wendell] Adding unit test for this situation c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers	2014-11-24 19:19:16 -08:00
Patrick Wendell	a68d442270	Revert "[SPARK-4525] Mesos should decline unused offers" This reverts commit `b043c27424`. I accidentally committed this using my own authorship credential. However, I should have given authoriship to the original author: Jongyoul Lee.	2014-11-24 19:18:15 -08:00
Patrick Wendell	b043c27424	[SPARK-4525] Mesos should decline unused offers Functionally, this is just a small change on top of #3393 (by jongyoul). The issue being addressed is discussed in the comments there. I have not yet added a test for the bug there. I will add one shortly. I've also done some minor renaming/clean-up of variables in this class and tests. Author: Patrick Wendell <pwendell@gmail.com> Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3436 from pwendell/mesos-issue and squashes the following commits: 58c35b5 [Patrick Wendell] Adding unit test for this situation c4f0697 [Patrick Wendell] Additional clean-up and fixes on top of existing fix f20f1b3 [Jongyoul Lee] [SPARK-4525] MesosSchedulerBackend.resourceOffers cannot decline unused offers from acceptedOffers - Added code for declining unused offers among acceptedOffers - Edited testCase for checking declining unused offers	2014-11-24 19:14:14 -08:00
Kay Ousterhout	d24d5bf064	[SPARK-4266] [Web-UI] Reduce stage page load time. The commit changes the java script used to show/hide additional metrics in order to reduce page load time. SPARK-4016 significantly increased page load time for the stage page when stages had a lot (thousands or tens of thousands) of tasks, due to the additional Javascript to hide some metrics by default and stripe the tables. This commit reduces page load time in two ways: (1) Now, all of the metrics that are hidden by default are hidden by setting "display: none;" using CSS for the page, rather than hiding them using javascript after the page loads. Without this change, for stages with thousands of tasks, there was a few second delay after page load, where first the additional metrics were shown, and then after a delay were hidden once the relevant JS finished running. (2) CSS is used to stripe all of the tables except for the summary table. The summary table needs javascript to do the striping because some rows are hidden, but the javascript striping is slower, which again resulted in a delay when it was used for the task table (where for a few seconds after page load, all of the rows in the task table would be white, while the browser finished running the JS to stripe the table). cc pwendell This change is intended to be backported to 1.2 to avoid a regression in UI performance when users run large jobs. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3328 from kayousterhout/SPARK-4266 and squashes the following commits: f964091 [Kay Ousterhout] [SPARK-4266] [Web-UI] Reduce stage page load time.	2014-11-24 18:03:10 -08:00
Davies Liu	6cf507685e	[SPARK-4548] []SPARK-4517] improve performance of python broadcast Re-implement the Python broadcast using file: 1) serialize the python object using cPickle, write into disks. 2) Create a wrapper in JVM (for the dumped file), it read data from during serialization 3) Using TorrentBroadcast or HttpBroadcast to transfer the data (compressed) into executors 4) During deserialization, writing the data into disk. 5) Passing the path into Python worker, read data from disk and unpickle it into python object, until the first access. It fixes the performance regression introduced in #2659, has similar performance as 1.1, but support object larger than 2G, also improve the memory efficiency (only one compressed copy in driver and executor). Testing with a 500M broadcast and 4 tasks (excluding the benefit from reused worker in 1.2): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 25.20 \| 9.33 \| 170.13% \| python-broadcast-w-set \| 4.13 \| 4.50 \| -8.35% \| Testing with 100 tasks (16 CPUs): name \| 1.1 \| 1.2 with this patch \| improvement ---------\|--------\|---------\|-------- python-broadcast-w-bytes \| 38.16 \| 8.40 \| 353.98% python-broadcast-w-set \| 23.29 \| 9.59 \| 142.80% Author: Davies Liu <davies@databricks.com> Closes #3417 from davies/pybroadcast and squashes the following commits: 50a58e0 [Davies Liu] address comments b98de1d [Davies Liu] disable gc while unpickle e5ee6b9 [Davies Liu] support large string 09303b8 [Davies Liu] read all data into memory dde02dd [Davies Liu] improve performance of python broadcast	2014-11-24 17:17:03 -08:00
Josh Rosen	4a90276ab2	[SPARK-4145] Web UI job pages This PR adds two new pages to the Spark Web UI: - A jobs overview page, which shows details on running / completed / failed jobs. - A job details page, which displays information on an individual job's stages. The jobs overview page is now the default UI homepage; the old homepage is still accessible at `/stages`. ### Screenshots #### New UI homepage ![image](https://cloud.githubusercontent.com/assets/50748/5119035/fd0a69e6-701f-11e4-89cb-db7e9705714f.png) #### Job details page (This is effectively a per-job version of the stages page that can be extended later with other things, such as DAG visualizations) ![image](https://cloud.githubusercontent.com/assets/50748/5134910/50b340d4-70c7-11e4-88e1-6b73237ea7c8.png) ### Key changes in this PR - Rename `JobProgressPage` to `AllStagesPage` - Expose `StageInfo` objects in the ``SparkListenerJobStart` event; add backwards-compatibility tests to JsonProtocol. - Add additional data structures to `JobProgressListener` to map from stages to jobs. - Add several fields to `JobUIData`. I also added ~150 lines of Selenium tests as I uncovered UI issues while developing this patch. ### Limitations If a job contains stages that aren't run, then its overall job progress bar may be an underestimate of the total job progress; in other words, a completed job may appear to have a progress bar that's not at 100%. If stages or tasks fail, then the progress bar will not go backwards to reflect the true amount of remaining work. Author: Josh Rosen <joshrosen@databricks.com> Closes #3009 from JoshRosen/job-page and squashes the following commits: eb05e90 [Josh Rosen] Disable kill button in completed stages tables. f00c851 [Josh Rosen] Fix JsonProtocol compatibility b89c258 [Josh Rosen] More JSON protocol backwards-compatibility fixes. ff804cd [Josh Rosen] Don't write "Stage Ids" field in JobStartEvent JSON. 6f17f3f [Josh Rosen] Only store StageInfos in SparkListenerJobStart event. 2bbf41a [Josh Rosen] Update job progress bar to reflect skipped tasks/stages. 61c265a [Josh Rosen] Add “skipped stages” table; only display non-empty tables. 1f45d44 [Josh Rosen] Incorporate a bunch of minor review feedback. 0b77e3e [Josh Rosen] More bug fixes for phantom stages. 034aa8d [Josh Rosen] Use `.max()` to find result stage for job. eebdc2c [Josh Rosen] Don’t display pending stages for completed jobs. 67080ba [Josh Rosen] Ensure that "phantom stages" don't cause memory leaks. 7d10b97 [Josh Rosen] Merge remote-tracking branch 'apache/master' into job-page d69c775 [Josh Rosen] Fix table sorting on all jobs page. 5eb39dc [Josh Rosen] Add pending stages table to job page. f2a15da [Josh Rosen] Add status field to job details page. 171b53c [Josh Rosen] Move `startTime` to the start of SparkContext. e2f2c43 [Josh Rosen] Fix sorting of stages in job details page. 8955f4c [Josh Rosen] Display information for pending stages on jobs page. 8ab6c28 [Josh Rosen] Compute numTasks from job start stage infos. 5884f91 [Josh Rosen] Add StageInfos to SparkListenerJobStart event. 79793cd [Josh Rosen] Track indices of completed stage to avoid overcounting when failures occur. d62ea7b [Josh Rosen] Add failing Selenium test for stage overcounting issue. 1145c60 [Josh Rosen] Display text instead of progress bar for stages. 3d0a007 [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page 8a2351b [Josh Rosen] Add help tooltip to Spark Jobs page. b7bf30e [Josh Rosen] Add stages progress bar; fix bug where active stages show as completed. 4846ce4 [Josh Rosen] Hide "(Job Group") if no jobs were submitted in job groups. 4d58e55 [Josh Rosen] Change label to "Tasks (for all stages)" 85e9c85 [Josh Rosen] Extract startTime into separate variable. 1cf4987 [Josh Rosen] Fix broken kill links; add Selenium test to avoid future regressions. 56701fa [Josh Rosen] Move last stage name / description logic out of markup. a475ea1 [Josh Rosen] Add progress bars to jobs page. `45343b8` [Josh Rosen] More comments 4b206fb [Josh Rosen] Merge remote-tracking branch 'origin/master' into job-page bfce2b9 [Josh Rosen] Address review comments, except for progress bar. 4487dcb [Josh Rosen] [SPARK-4145] Web UI job pages 2568a6c [Josh Rosen] Rename JobProgressPage to AllStagesPage:	2014-11-24 13:18:14 -08:00
Cheng Lian	a6d7b61f92	[SPARK-4479][SQL] Avoids unnecessary defensive copies when sort based shuffle is on This PR is a workaround for SPARK-4479. Two changes are introduced: when merge sort is bypassed in `ExternalSorter`, 1. also bypass RDD elements buffering as buffering is the reason that `MutableRow` backed row objects must be copied, and 2. avoids defensive copies in `Exchange` operator <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3422) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #3422 from liancheng/avoids-defensive-copies and squashes the following commits: 591f2e9 [Cheng Lian] Passes all shuffle suites 0c3c91e [Cheng Lian] Fixes shuffle write metrics when merge sort is bypassed ed5df3c [Cheng Lian] Fixes styling changes f75089b [Cheng Lian] Avoids unnecessary defensive copies when sort based shuffle is on	2014-11-24 12:43:45 -08:00
Prashant Sharma	9b2a3c6126	[SPARK-4377] Fixed serialization issue by switching to akka provided serializer. ... - there is no way around this for deserializing actorRef(s). Author: Prashant Sharma <prashant.s@imaginea.com> Closes #3402 from ScrapCodes/SPARK-4377/troubleDeserializing and squashes the following commits: 77233fd [Prashant Sharma] Style fixes 9b35c6e [Prashant Sharma] Scalastyle fixes 29880da [Prashant Sharma] [SPARK-4377] Fixed serialization issue by switching to akka provided serializer - there is no way around this for deserializing actorRef(s).	2014-11-22 14:05:38 -08:00
zsxwing	65b987c3ed	[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience This PR moved `implicit`s to `package object` and `companion object` to enable the Scala compiler search them automatically without explicit importing. It should not break any API. A test project for backforward compatibility is [here](https://github.com/zsxwing/SPARK-4397-Backforward-Compatibility). It proves the codes compiled with Spark 1.1.0 can run with this PR. To summarize, the changes are: * Deprecated the old implicit conversion functions: this preserves binary compatibility for code compiled against earlier versions of Spark. * Removed "implicit" from them so they are just normal functions: this made sure the compiler doesn't get confused and warn about multiple implicits in scope. * Created new implicit functions in package rdd object, which is part of the scope that scalac will search when looking for implicit conversions on various RDD objects. The disadvantage is there are duplicated codes in SparkContext for backforward compatibility. Author: zsxwing <zsxwing@gmail.com> Closes #3262 from zsxwing/SPARK-4397 and squashes the following commits: fc30314 [zsxwing] Update the comments 9c27aff [zsxwing] Move implicit functions to object RDD and forward old functions to new implicit ones directly 2b5f5a4 [zsxwing] Comments for the deprecated functions 52353de [zsxwing] Remove private[spark] from object WritableConverter 34641d4 [zsxwing] Move ImplicitSuite to org.apache.sparktest 7266218 [zsxwing] Add comments to warn the duplicate codes in SparkContext 185c12f [zsxwing] Remove simpleWritableConverter from SparkContext 3bdcae2 [zsxwing] Move WritableConverter implicits to object WritableConverter 9b73188 [zsxwing] Fix the code style issue 3ac4f07 [zsxwing] Add license header 1eda9e4 [zsxwing] Reorganize 'implicit's to improve the API convenience	2014-11-21 10:06:30 -08:00
Leolh	e216ffaead	[SPARK-4446] [SPARK CORE] MetadataCleaner schedule task with a wrong param for delay time . Author: Leolh <leosandylh@gmail.com> Closes #3306 from Leolh/master and squashes the following commits: 4a21f4e [Leolh] Update MetadataCleaner.scala	2014-11-19 18:18:55 -08:00
Andrew Or	0eb4a7fb0f	[SPARK-4480] Avoid many small spills in external data structures Summary. Currently, we may spill many small files in `ExternalAppendOnlyMap` and `ExternalSorter`. The underlying root cause of this is summarized in [SPARK-4452](https://issues.apache.org/jira/browse/SPARK-4452). This PR does not address this root cause, but simply provides the guarantee that we never spill the in-memory data structure if its size is less than a configurable threshold of 5MB. This config is not documented because we don't want users to set it themselves, and it is not hard-coded because we need to change it in tests. Symptom. Each spill is orders of magnitude smaller than 1MB, and there are many spills. In environments where the ulimit is set, this frequently causes "too many open file" exceptions observed in [SPARK-3633](https://issues.apache.org/jira/browse/SPARK-3633). ``` 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292769 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4760 B to disk (292770 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4520 B to disk (292771 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4560 B to disk (292772 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4792 B to disk (292773 spills so far) 14/11/13 19:20:43 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 4784 B to disk (292774 spills so far) ``` Reproduction. I ran the following on a small 4-node cluster with 512MB executors. Note that the back-to-back shuffle here is necessary for reasons described in [SPARK-4522](https://issues.apache.org/jira/browse/SPARK-4452). The second shuffle is a `reduceByKey` because it performs a map-side combine. ``` sc.parallelize(1 to 100000000, 100) .map { i => (i, i) } .groupByKey() .reduceByKey(_ ++ _) .count() ``` Before the change, I notice that each thread may spill up to 1000 times, and the size of each spill is on the order of 10KB. After the change, each thread spills only up to 20 times in the worst case, and the size of each spill is on the order of 1MB. Author: Andrew Or <andrew@databricks.com> Closes #3353 from andrewor14/avoid-small-spills and squashes the following commits: 49f380f [Andrew Or] Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/spark into avoid-small-spills 27d6966 [Andrew Or] Merge branch 'master' of github.com:apache/spark into avoid-small-spills f4736e3 [Andrew Or] Fix tests a919776 [Andrew Or] Avoid many small spills	2014-11-19 18:07:27 -08:00
Nishkam Ravi	73fedf5a6e	[Spark-4484] Treat maxResultSize as unlimited when set to 0; improve error message The check for maxResultSize > 0 is missing, results in failures. Also, error message needs to be improved so the developers know that there is a new parameter to be configured Author: Nishkam Ravi <nravi@cloudera.com> Author: nravi <nravi@c1704.halxg.cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Closes #3360 from nishkamravi2/master_nravi and squashes the following commits: 5c9a4cb [nishkamravi2] Update TaskSetManagerSuite.scala 535295a [nishkamravi2] Update TaskSetManager.scala 3e1b616 [Nishkam Ravi] Modify test for maxResultSize 9f6583e [Nishkam Ravi] Changes to maxResultSize code (improve error message and add condition to check if maxResultSize > 0) 5f8f9ed [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi 636a9ff [nishkamravi2] Update YarnAllocator.scala 8f76c8b [Nishkam Ravi] Doc change for yarn memory overhead 35daa64 [Nishkam Ravi] Slight change in the doc for yarn memory overhead 5ac2ec1 [Nishkam Ravi] Remove out dac1047 [Nishkam Ravi] Additional documentation for yarn memory overhead issue 42c2c3d [Nishkam Ravi] Additional changes for yarn memory overhead issue 362da5e [Nishkam Ravi] Additional changes for yarn memory overhead c726bd9 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi f00fa31 [Nishkam Ravi] Improving logging for AM memoryOverhead 1cf2d1e [nishkamravi2] Update YarnAllocator.scala ebcde10 [Nishkam Ravi] Modify default YARN memory_overhead-- from an additive constant to a multiplier (redone to resolve merge conflicts) 2e69f11 [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark into master_nravi efd688a [Nishkam Ravi] Merge branch 'master' of https://github.com/apache/spark 2b630f9 [nravi] Accept memory input as "30g", "512M" instead of an int value, to be consistent with rest of Spark 3bf8fad [nravi] Merge branch 'master' of https://github.com/apache/spark 5423a03 [nravi] Merge branch 'master' of https://github.com/apache/spark eb663ca [nravi] Merge branch 'master' of https://github.com/apache/spark df2aeb1 [nravi] Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456) 6b840f0 [nravi] Undo the fix for SPARK-1758 (the problem is fixed) 5108700 [nravi] Fix in Spark for the Concurrent thread modification issue (SPARK-1097, HADOOP-10456) 681b36f [nravi] Fix for SPARK-1758: failing test org.apache.spark.JavaAPISuite.wholeTextFiles	2014-11-19 17:23:42 -08:00
Akshat Aranya	9ccc53c72c	[SPARK-4478] Keep totalRegisteredExecutors up-to-date This rebases PR 3368. This commit fixes totalRegisteredExecutors update [SPARK-4478], so that we can correctly keep track of number of registered executors. Author: Akshat Aranya <aaranya@quantcast.com> Closes #3373 from coolfrood/topic/SPARK-4478 and squashes the following commits: 8a4d1e4 [Akshat Aranya] Added comment 150ae93 [Akshat Aranya] [SPARK-4478] Keep totalRegisteredExecutors up-to-date	2014-11-19 17:20:20 -08:00
Josh Rosen	04d462f648	[SPARK-4495] Fix memory leak in JobProgressListener This commit fixes a memory leak in JobProgressListener that I introduced in SPARK-2321 and adds a testing framework to ensure that it’s very difficult to inadvertently introduce new memory leaks. This solution might be overkill, but the main idea is to partition JobProgressListener's state into three buckets: collections that should be empty once Spark is idle, collections that must obey some hard size limit, and collections that have a soft size limit (they can grow arbitrarily large when Spark is active but must shrink to fit within some bound after Spark becomes idle). Based on this, we can write fairly generic tests that run workloads that submit more than `spark.ui.retainedStages` stages and `spark.ui.retainedJobs` jobs then check that these various collections' sizes obey their contracts. Author: Josh Rosen <joshrosen@databricks.com> Closes #3372 from JoshRosen/SPARK-4495 and squashes the following commits: c73fab5 [Josh Rosen] "data structures" -> collections be72e81 [Josh Rosen] [SPARK-4495] Fix memory leaks in JobProgressListener	2014-11-19 16:50:21 -08:00
Kenichi Maehashi	eacc788346	[SPARK-4470] Validate number of threads in local mode When running Spark locally, if number of threads is specified as 0 (e.g., `spark-submit --master local[0] ...`), the job got stuck and does not run at all. I think it's better to validate the parameter. Fix for [SPARK-4470](https://issues.apache.org/jira/browse/SPARK-4470). Author: Kenichi Maehashi <webmaster@kenichimaehashi.com> Closes #3337 from kmaehashi/spark-4470 and squashes the following commits: 3ad76f3 [Kenichi Maehashi] fix code style 7716734 [Kenichi Maehashi] SPARK-4470: Validate number of threads in local mode	2014-11-19 12:11:09 -08:00
Tianshuo Deng	d75579d099	[SPARK-4467] fix elements read count for ExtrenalSorter the elementsRead variable should be reset to 0 after each spilling Author: Tianshuo Deng <tdeng@twitter.com> Closes #3302 from tsdeng/fix_external_sorter_record_count and squashes the following commits: 7b56ca0 [Tianshuo Deng] fix method signature 782c7de [Tianshuo Deng] make elementsRead private, fix comment bb7ff28 [Tianshuo Deng] update elemetsRead through addElementsRead method 74ca246 [Tianshuo Deng] fix elements read count	2014-11-19 10:02:16 -08:00
Mingfei	165cec9c46	[Spark-4432]close InStream after the block is accessed InStream is not closed after data is read from Tachyon. which makes the blocks in Tachyon locked after accessed. Author: Mingfei <mingfei.shi@intel.com> Closes #3290 from shimingfei/lockFix and squashes the following commits: fffe345 [Mingfei] close InStream after the block is accessed	2014-11-18 22:17:06 -08:00
Mingfei	67e9876b3e	[SPARK-4441] Close Tachyon client when TachyonBlockManager is shutdown Currently Tachyon client is not closed when TachyonBlockManager is shut down. which causes some resources in Tachyon not reclaimed Author: Mingfei <mingfei.shi@intel.com> Closes #3299 from shimingfei/closeClient and squashes the following commits: 0913fbd [Mingfei] close Tachyon client when TachyonBlockManager is shutdown	2014-11-18 22:16:36 -08:00
Marcelo Vanzin	397d3aae5b	Bumping version to 1.3.0-SNAPSHOT. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3277 from vanzin/version-1.3 and squashes the following commits: 7c3c396 [Marcelo Vanzin] Added temp repo to sbt build. 5f404ff [Marcelo Vanzin] Add another exclusion. 19457e7 [Marcelo Vanzin] Update old version to 1.2, add temporary 1.2 repo. 3c8d705 [Marcelo Vanzin] Workaround for MIMA checks. e940810 [Marcelo Vanzin] Bumping version to 1.3.0-SNAPSHOT.	2014-11-18 21:24:18 -08:00
Xiangrui Meng	bb46046154	[SPARK-4433] fix a racing condition in zipWithIndex Spark hangs with the following code: ~~~ sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() ~~~ This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction. This should be applied to branch-1.0, branch-1.1, and branch-1.2. pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #3291 from mengxr/SPARK-4433 and squashes the following commits: c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex	2014-11-18 16:25:44 -08:00
Davies Liu	4a377aff2d	[SPARK-3721] [PySpark] broadcast objects larger than 2G This patch will bring support for broadcasting objects larger than 2G. pickle, zlib, FrameSerializer and Array[Byte] all can not support objects larger than 2G, so this patch introduce LargeObjectSerializer to serialize broadcast objects, the object will be serialized and compressed into small chunks, it also change the type of Broadcast[Array[Byte]]] into Broadcast[Array[Array[Byte]]]]. Testing for support broadcast objects larger than 2G is slow and memory hungry, so this is tested manually, could be added into SparkPerf. Author: Davies Liu <davies@databricks.com> Author: Davies Liu <davies.liu@gmail.com> Closes #2659 from davies/huge and squashes the following commits: 7b57a14 [Davies Liu] add more tests for broadcast 28acff9 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge a2f6a02 [Davies Liu] bug fix 4820613 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 5875c73 [Davies Liu] address comments 10a349b [Davies Liu] address comments 0c33016 [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 6182c8f [Davies Liu] Merge branch 'master' into huge d94b68f [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 2514848 [Davies Liu] address comments fda395b [Davies Liu] Merge branch 'master' of github.com:apache/spark into huge 1c2d928 [Davies Liu] fix scala style 091b107 [Davies Liu] broadcast objects larger than 2G	2014-11-18 16:17:51 -08:00
Kay Ousterhout	010bc86e40	[SPARK-4463] Add (de)select all button for add'l metrics. This commit removes the behavior where when a user clicks "Show additional metrics" on the stage page, all of the additional metrics are automatically selected; now, collapsing and expanding the additional metrics has no effect on which options are selected. Instead, there's a "(De)select All" box at the top; checking this box checks all additional metrics (and similarly, unchecking it unchecks all additional metrics). This commit is intended to be backported to 1.2, so that the additional metrics behavior is not confusing to users. Now when a user clicks the "Show additional metrics" menu, this is what it looks like: ![image](https://cloud.githubusercontent.com/assets/1108612/5094347/1541ead6-6f15-11e4-8e8c-25a65ddbdfb2.png) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3331 from kayousterhout/SPARK-4463 and squashes the following commits: 9e17cea [Kay Ousterhout] Added italics b731230 [Kay Ousterhout] [SPARK-4463] Add (de)select all button for add'l metrics.	2014-11-18 15:01:06 -08:00
Davies Liu	e34f38ff1a	[SPARK-4017] show progress bar in console The progress bar will look like this: ![1___spark_job__85_250_finished__4_are_running___java_](https://cloud.githubusercontent.com/assets/40902/4854813/a02f44ac-6099-11e4-9060-7c73a73151d6.png) In the right corner, the numbers are: finished tasks, running tasks, total tasks. After the stage has finished, it will disappear. The progress bar is only showed if logging level is WARN or higher (but progress in title is still showed), it can be turned off by spark.driver.showConsoleProgress. Author: Davies Liu <davies@databricks.com> Closes #3029 from davies/progress and squashes the following commits: 95336d5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress fc49ac8 [Davies Liu] address commentse 2e90f75 [Davies Liu] show multiple stages in same time 0081bcc [Davies Liu] address comments 38c42f1 [Davies Liu] fix tests ab87958 [Davies Liu] disable progress bar during tests 30ac852 [Davies Liu] re-implement progress bar b3f34e5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress 6fd30ff [Davies Liu] show progress bar if no task finished in 500ms e4e7344 [Davies Liu] refactor e1f524d [Davies Liu] revert unnecessary change a60477c [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress 5cae3f2 [Davies Liu] fix style ea49fe0 [Davies Liu] address comments bc53d99 [Davies Liu] refactor e6bb189 [Davies Liu] fix logging in sparkshell 7e7d4e7 [Davies Liu] address commments 5df26bb [Davies Liu] fix style 9e42208 [Davies Liu] show progress bar in console and title	2014-11-18 13:37:21 -08:00
Davies Liu	80f3177882	[SPARK-4404] remove sys.exit() in shutdown hook If SparkSubmit die first, then bootstrapper will be blocked by shutdown hook. sys.exit() in a shutdown hook will cause some kind of dead lock. cc andrewor14 Author: Davies Liu <davies@databricks.com> Closes #3289 from davies/fix_bootstraper and squashes the following commits: ea5cdd1 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_bootstraper e04b690 [Davies Liu] remove sys.exit in hook 4d11366 [Davies Liu] remove shutdown hook if subprocess die fist	2014-11-18 13:11:38 -08:00
Kousuke Saruta	bfebfd8b28	[SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name. This PR adds a regression test for SPARK-4434. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3326 from sarutak/add-triple-slash-testcase and squashes the following commits: 82bc9cc [Kousuke Saruta] Fixed wrong grammar in comment 9149027 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase c1c80ca [Kousuke Saruta] Fixed style 4f30210 [Kousuke Saruta] Modified comments 9e09da2 [Kousuke Saruta] Fixed URI validation for jar file d4b99ef [Kousuke Saruta] [SPARK-4075] [Deploy] Jar url validation is not enough for Jar file ac79906 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase 6d4f47e [Kousuke Saruta] Added a test case as a regression check for SPARK-4434	2014-11-18 12:17:33 -08:00
Patrick Wendell	c6e0c2ab1c	SPARK-4466: Provide support for publishing Scala 2.11 artifacts to Maven The maven release plug-in does not have support for publishing two separate sets of artifacts for a single release. Because of the way that Scala 2.11 support in Spark works, we have to write some customized code to do this. The good news is that the Maven release API is just a thin wrapper on doing git commits and pushing artifacts to the HTTP API of Apache's Sonatype server and this might overall make our deployment easier to understand. This was already used for the 1.2 snapshot, so I think it is working well. One other nice thing is this could be pretty easily extended to publish nightly snapshots. Author: Patrick Wendell <pwendell@gmail.com> Closes #3332 from pwendell/releases and squashes the following commits: 2fedaed [Patrick Wendell] Automate the opening and closing of Sonatype repos e2a24bb [Patrick Wendell] Fixing issue where we overrode non-spark version numbers 9df3a50 [Patrick Wendell] Adding TODO 1cc1749 [Patrick Wendell] Don't build the thriftserver for 2.11 933201a [Patrick Wendell] Make tagging of release commit eager d0388a6 [Patrick Wendell] Support Scala 2.11 build 4f4dc62 [Patrick Wendell] Change to 2.11 should not be included when committing new patch bf742e1 [Patrick Wendell] Minor fixes ffa1df2 [Patrick Wendell] Adding a Scala 2.11 package to test it 9ac4381 [Patrick Wendell] Addressing TODO b3105ff [Patrick Wendell] Removing commented out code d906803 [Patrick Wendell] Small fix 3f4d985 [Patrick Wendell] More work fcd54c2 [Patrick Wendell] Consolidating use of keys df2af30 [Patrick Wendell] Changes to release stuff	2014-11-17 21:07:50 -08:00
Josh Rosen	0f3ceb56c7	[SPARK-4180] [Core] Prevent creation of multiple active SparkContexts This patch adds error-detection logic to throw an exception when attempting to create multiple active SparkContexts in the same JVM, since this is currently unsupported and has been known to cause confusing behavior (see SPARK-2243 for more details). The solution implemented here is only a partial fix. A complete fix would have the following properties: 1. Only one SparkContext may ever be under construction at any given time. 2. Once a SparkContext has been successfully constructed, any subsequent construction attempts should fail until the active SparkContext is stopped. 3. If the SparkContext constructor throws an exception, then all resources created in the constructor should be cleaned up (SPARK-4194). 4. If a user attempts to create a SparkContext but the creation fails, then the user should be able to create new SparkContexts. This PR only provides 2) and 4); we should be able to provide all of these properties, but the correct fix will involve larger changes to SparkContext's construction / initialization, so we'll target it for a different Spark release. ### The correct solution: I think that the correct way to do this would be to move the construction of SparkContext's dependencies into a static method in the SparkContext companion object. Specifically, we could make the default SparkContext constructor `private` and change it to accept a `SparkContextDependencies` object that contains all of SparkContext's dependencies (e.g. DAGScheduler, ContextCleaner, etc.). Secondary constructors could call a method on the SparkContext companion object to create the `SparkContextDependencies` and pass the result to the primary SparkContext constructor. For example: ```scala class SparkContext private (deps: SparkContextDependencies) { def this(conf: SparkConf) { this(SparkContext.getDeps(conf)) } } object SparkContext( private[spark] def getDeps(conf: SparkConf): SparkContextDependencies = synchronized { if (anotherSparkContextIsActive) { throw Exception(...) } var dagScheduler: DAGScheduler = null try { dagScheduler = new DAGScheduler(...) [...] } catch { case e: Exception => Option(dagScheduler).foreach(_.stop()) [...] } SparkContextDependencies(dagScheduler, ....) } } ``` This gives us mutual exclusion and ensures that any resources created during the failed SparkContext initialization are properly cleaned up. This indirection is necessary to maintain binary compatibility. In retrospect, it would have been nice if SparkContext had no private constructors and could only be created through builder / factory methods on its companion object, since this buys us lots of flexibility and makes dependency injection easier. ### Alternative solutions: As an alternative solution, we could refactor SparkContext's primary constructor to perform all object creation in a giant `try-finally` block. Unfortunately, this will require us to turn a bunch of `vals` into `vars` so that they can be assigned from the `try` block. If we still want `vals`, we could wrap each `val` in its own `try` block (since the try block can return a value), but this will lead to extremely messy code and won't guard against the introduction of future code which doesn't properly handle failures. The more complex approach outlined above gives us some nice dependency injection benefits, so I think that might be preferable to a `var`-ification. ### This PR's solution: - At the start of the constructor, check whether some other SparkContext is active; if so, throw an exception. - If another SparkContext might be under construction (or has thrown an exception during construction), allow the new SparkContext to begin construction but log a warning (since resources might have been leaked from a failed creation attempt). - At the end of the SparkContext constructor, check whether some other SparkContext constructor has raced and successfully created an active context. If so, throw an exception. This guarantees that no two SparkContexts will ever be active and exposed to users (since we check at the very end of the constructor). If two threads race to construct SparkContexts, then one of them will win and another will throw an exception. This exception can be turned into a warning by setting `spark.driver.allowMultipleContexts = true`. The exception is disabled in unit tests, since there are some suites (such as Hive) that may require more significant refactoring to clean up their SparkContexts. I've made a few changes to other suites' test fixtures to properly clean up SparkContexts so that the unit test logs contain fewer warnings. Author: Josh Rosen <joshrosen@databricks.com> Closes #3121 from JoshRosen/SPARK-4180 and squashes the following commits: 23c7123 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4180 d38251b [Josh Rosen] Address latest round of feedback. c0987d3 [Josh Rosen] Accept boolean instead of SparkConf in methods. 85a424a [Josh Rosen] Incorporate more review feedback. 372d0d3 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4180 f5bb78c [Josh Rosen] Update mvn build, too. d809cb4 [Josh Rosen] Improve handling of failed SparkContext creation attempts. 79a7e6f [Josh Rosen] Fix commented out test a1cba65 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4180 7ba6db8 [Josh Rosen] Add utility to set system properties in tests. 4629d5c [Josh Rosen] Set spark.driver.allowMultipleContexts=true in tests. ed17e14 [Josh Rosen] Address review feedback; expose hack workaround for existing unit tests. 1c66070 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4180 06c5c54 [Josh Rosen] Add / improve SparkContext cleanup in streaming BasicOperationsSuite d0437eb [Josh Rosen] StreamingContext.stop() should stop SparkContext even if StreamingContext has not been started yet. c4d35a2 [Josh Rosen] Log long form of creation site to aid debugging. 918e878 [Josh Rosen] Document "one SparkContext per JVM" limitation. afaa7e3 [Josh Rosen] [SPARK-4180] Prevent creations of multiple active SparkContexts.	2014-11-17 12:48:18 -08:00
Andrew Or	dbb9da5c3d	Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file" This reverts commit `098f83c7cc`.	2014-11-17 11:24:28 -08:00
Prashant Sharma	5c92d47ad2	SPARK-4445, Don't display storage level in toDebugString unless RDD is persisted. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #3310 from ScrapCodes/SPARK-4445/rddDebugStringFix and squashes the following commits: 4e57c52 [Prashant Sharma] SPARK-4445, Don't display storage level in toDebugString unless RDD is persisted	2014-11-17 10:40:33 -08:00
Josh Rosen	7850e0c707	[SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; use HashedWheelTimer This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Thanks to cristianopris for narrowing down this issue! Author: Josh Rosen <joshrosen@databricks.com> Closes #3259 from JoshRosen/connection-manager-timeout-bugfix and squashes the following commits: afcc8d6 [Josh Rosen] Address rxin's review feedback. 2a2e92d [Josh Rosen] Keep only WeakReference to promise in TimerTask; 0f0913b [Josh Rosen] Spelling fix: timout => timeout 3200c33 [Josh Rosen] Use Netty HashedWheelTimer f847dd4 [Josh Rosen] Don't capture entire message in ACK timeout task.	2014-11-16 00:44:15 -08:00
Josh Rosen	40eb8b6ef3	[SPARK-2321] Several progress API improvements / refactorings This PR refactors / extends the status API introduced in #2696. - Change StatusAPI from a mixin trait to a class. Before, the new status API methods were directly accessible through SparkContext, whereas now they're accessed through a `sc.statusAPI` field. As long as we were going to add these methods directly to SparkContext, the mixin trait seemed like a good idea, but this might be simpler to reason about and may avoid pitfalls that I've run into while attempting to refactor other parts of SparkContext to use mixins (see #3071, for example). - Change the name from SparkStatusAPI to SparkStatusTracker. - Make `getJobIdsForGroup(null)` return ids for jobs that aren't associated with any job group. - Add `getActiveStageIds()` and `getActiveJobIds()` methods that return the ids of whatever's currently active in this SparkContext. This should simplify davies's progress bar code. Author: Josh Rosen <joshrosen@databricks.com> Closes #3197 from JoshRosen/progress-api-improvements and squashes the following commits: 30b0afa [Josh Rosen] Rename SparkStatusAPI to SparkStatusTracker. d1b08d8 [Josh Rosen] Add missing newlines 2cc7353 [Josh Rosen] Add missing file. d5eab1f [Josh Rosen] Add getActive[Stage\|Job]Ids() methods. a227984 [Josh Rosen] getJobIdsForGroup(null) should return jobs for default group c47e294 [Josh Rosen] Remove StatusAPI mixin trait.	2014-11-14 23:46:25 -08:00
Kousuke Saruta	60969b0336	[SPARK-4260] Httpbroadcast should set connection timeout. Httpbroadcast sets read timeout but doesn't set connection timeout. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3122 from sarutak/httpbroadcast-timeout and squashes the following commits: c7f3a56 [Kousuke Saruta] Added Connection timeout for Http Connection to HttpBroadcast.scala	2014-11-14 22:36:56 -08:00
zsxwing	861223ee5b	[SPARK-4363][Doc] Update the Broadcast example Author: zsxwing <zsxwing@gmail.com> Closes #3226 from zsxwing/SPARK-4363 and squashes the following commits: 8109914 [zsxwing] Update the Broadcast example	2014-11-14 22:28:48 -08:00
zsxwing	dba1405823	[SPARK-4379][Core] Change Exception to SparkException in checkpoint It's better to change to SparkException. However, it's a breaking change since it will change the exception type. Author: zsxwing <zsxwing@gmail.com> Closes #3241 from zsxwing/SPARK-4379 and squashes the following commits: 409f3af [zsxwing] Change Exception to SparkException in checkpoint	2014-11-14 22:25:41 -08:00
Davies Liu	7fe08b43c7	[SPARK-4415] [PySpark] JVM should exit after Python exit When JVM is started in a Python process, it should exit once the stdin is closed. test: add spark.driver.memory in conf/spark-defaults.conf ``` daviesdm:~/work/spark$ cat conf/spark-defaults.conf spark.driver.memory 8g daviesdm:~/work/spark$ bin/pyspark >>> quit daviesdm:~/work/spark$ jps 4931 Jps 286 daviesdm:~/work/spark$ python wc.py 943738 0.719928026199 daviesdm:~/work/spark$ jps 286 4990 Jps ``` Author: Davies Liu <davies@databricks.com> Closes #3274 from davies/exit and squashes the following commits: df0e524 [Davies Liu] address comments ce8599c [Davies Liu] address comments 050651f [Davies Liu] JVM should exit after Python exit	2014-11-14 20:14:33 -08:00
WangTao	303a4e4d23	[SPARK-4404]SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-proc... ...ess ends https://issues.apache.org/jira/browse/SPARK-4404 When we have spark.driver.extra* or spark.driver.memory in SPARK_SUBMIT_PROPERTIES_FILE, spark-class will use SparkSubmitDriverBootstrapper to launch driver. If we get process id of SparkSubmitDriverBootstrapper and wanna kill it during its running, we expect its SparkSubmit sub-process stop also. Author: WangTao <barneystinson@aliyun.com> Author: WangTaoTheTonic <barneystinson@aliyun.com> Closes #3266 from WangTaoTheTonic/killsubmit and squashes the following commits: e03eba5 [WangTaoTheTonic] add comments 57b5ca1 [WangTao] SparkSubmitDriverBootstrapper should stop after its SparkSubmit sub-process ends	2014-11-14 20:11:51 -08:00
Sandy Ryza	ad42b28324	SPARK-4214. With dynamic allocation, avoid outstanding requests for more... ... executors than pending tasks need. WIP. Still need to add and fix tests. Author: Sandy Ryza <sandy@cloudera.com> Closes #3204 from sryza/sandy-spark-4214 and squashes the following commits: 35cf0e0 [Sandy Ryza] Add comment 13b53df [Sandy Ryza] Review feedback 067465f [Sandy Ryza] Whitespace fix 6ae080c [Sandy Ryza] Add tests and get num pending tasks from ExecutorAllocationListener 531e2b6 [Sandy Ryza] SPARK-4214. With dynamic allocation, avoid outstanding requests for more executors than pending tasks need.	2014-11-14 15:51:40 -08:00
Jeff Hammerbacher	c258db9ed4	Update failed assert text to match code in SizeEstimatorSuite Author: Jeff Hammerbacher <jeff.hammerbacher@gmail.com> Closes #3242 from hammer/patch-1 and squashes the following commits: f88d635 [Jeff Hammerbacher] Update failed assert text to match code in SizeEstimatorSuite	2014-11-14 13:37:48 -08:00
zsxwing	156cf3333d	[SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode In yarn-cluster mode, the Web UI is running behind a yarn proxy server. Some features(or bugs?) of yarn proxy server will break the links for thread dump. 1. Yarn proxy server will do http redirect internally, so if opening `http://example.com:8088/cluster/app/application_1415344371838_0012/executors`, it will fetch `http://example.com:8088/cluster/app/application_1415344371838_0012/executors/` and return the content but won't change the link in the browser. Then when a user clicks `Thread Dump`, it will jump to `http://example.com:8088/proxy/application_1415344371838_0012/threadDump/?executorId=2`. This is a wrong link. The correct link should be `http://example.com:8088/proxy/application_1415344371838_0012/executors/threadDump/?executorId=2`. Adding "/" to the tab links will fix it. 2. Yarn proxy server has a bug about the URL encode/decode. When a user accesses `http://example.com:8088/proxy/application_1415344371838_0006/executors/threadDump/?executorId=%3Cdriver%3E`, the yarn proxy server will require `http://example.com:36429/executors/threadDump/?executorId=%25253Cdriver%25253E`. But Spark web server expects `http://example.com:36429/executors/threadDump/?executorId=%3Cdriver%3E`. Related to [YARN-2844](https://issues.apache.org/jira/browse/YARN-2844). For now, it's a tricky approach to bypass the yarn bug. ![threaddump](https://cloud.githubusercontent.com/assets/1000778/4972567/d1ccba64-68ad-11e4-983e-257530cef35a.png) Author: zsxwing <zsxwing@gmail.com> Closes #3183 from zsxwing/SPARK-4313 and squashes the following commits: 3379ca8 [zsxwing] Encode the executor id in the thread dump link and update the comment abfa063 [zsxwing] Fix link issue of the executor thread dump page in yarn-cluster mode	2014-11-14 13:36:13 -08:00
Hong Shen	0c56a039a9	[Spark Core] SPARK-4380 Edit spilling log from MB to B https://issues.apache.org/jira/browse/SPARK-4380 Author: Hong Shen <hongshen@tencent.com> Closes #3243 from shenh062326/spark_change and squashes the following commits: 4653378 [Hong Shen] Edit spilling log from MB to B 21ee960 [Hong Shen] Edit spilling log from MB to B e9145e8 [Hong Shen] Edit spilling log from MB to B da761c2 [Hong Shen] Edit spilling log from MB to B 946351c [Hong Shen] Edit spilling log from MB to B	2014-11-14 13:29:41 -08:00
zsxwing	825709a0b8	[SPARK-4310][WebUI] Sort 'Submitted' column in Stage page by time Author: zsxwing <zsxwing@gmail.com> Closes #3179 from zsxwing/SPARK-4310 and squashes the following commits: b0d29f5 [zsxwing] Sort 'Submitted' column in Stage page by time	2014-11-13 14:37:04 -08:00
Aaron Davidson	b9e1c2eb9b	[SPARK-4370] [Core] Limit number of Netty cores based on executor size Author: Aaron Davidson <aaron@databricks.com> Closes #3155 from aarondav/conf and squashes the following commits: 7045e77 [Aaron Davidson] Add mesos comment 4770f6e [Aaron Davidson] [SPARK-4370] [Core] Limit number of Netty cores based on executor size	2014-11-12 18:46:37 -08:00
Davies Liu	d7d54a44e3	[SPARK-2672] support compressed file in wholeTextFile The wholeFile() can not read compressed files, it should be, just like textFile(). Author: Davies Liu <davies@databricks.com> Closes #3005 from davies/whole and squashes the following commits: a43fcfb [Davies Liu] remove semicolon c83571a [Davies Liu] remove = if return type is Unit 83c844f [Davies Liu] Merge branch 'master' of github.com:apache/spark into whole 22e8b3e [Davies Liu] support compressed file in wholeTextFile	2014-11-12 15:58:12 -08:00
Andrew Or	6e3c5a296c	[Test] Better exception message from SparkSubmitSuite Before: ``` Exception in thread "main" java.lang.Exception: Could not load user defined classes inside of executors at org.apache.spark.deploy.JarCreationTest$.main(SparkSubmitSuite.scala:471) at org.apache.spark.deploy.JarCreationTest.main(SparkSubmitSuite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ``` After: ``` Exception in thread "main" java.lang.Exception: Could not load user class from jar: java.lang.UnsupportedClassVersionError: SparkSubmitClassA : Unsupported major.minor version 51.0 java.lang.ClassLoader.defineClass1(Native Method) java.lang.ClassLoader.defineClass(ClassLoader.java:643) ... at org.apache.spark.deploy.JarCreationTest$.main(SparkSubmitSuite.scala:472) at org.apache.spark.deploy.JarCreationTest.main(SparkSubmitSuite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ``` Author: Andrew Or <andrew@databricks.com> Closes #3212 from andrewor14/submit-suite-message and squashes the following commits: 7779248 [Andrew Or] Format exception 8fe6719 [Andrew Or] Better exception message from failed test	2014-11-12 13:35:48 -08:00
Prashant Sharma	daaca14c16	Support cross building for Scala 2.11 Let's give this another go using a version of Hive that shades its JLine dependency. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits: e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script. f65d17d [Patrick Wendell] Fixing build issue due to merge conflict a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state. 7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant 583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver 3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests." 935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily." 925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily. 2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future. 8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven. 5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs. 2121071 [Patrick Wendell] Migrating version detection to PySpark b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests. 1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11 f5cad4e [Patrick Wendell] Add Scala 2.11 docs 210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline" 48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles. e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only" 67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check 8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only e22b104 [Patrick Wendell] Small fix in pom file ec402ab [Patrick Wendell] Various fixes 0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline 4eaec65 [Prashant Sharma] Changed scripts to ignore target. 5167bea [Prashant Sharma] small correction a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins. 80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests. 034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt. d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11. 6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10 e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted. 937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION cb059b0 [Prashant Sharma] Code review 0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.	2014-11-11 21:36:48 -08:00
Timothy Chen	a878660d2d	SPARK-2269 Refactor mesos scheduler resourceOffers and add unit test Author: Timothy Chen <tnachen@gmail.com> Closes #1487 from tnachen/resource_offer_refactor and squashes the following commits: 4ea5dec [Timothy Chen] Rebase from master and address comments 9ccab09 [Timothy Chen] Address review comments e6494dc [Timothy Chen] Refactor class loading 8207428 [Timothy Chen] Refactor mesos scheduler resourceOffers and add unit test	2014-11-11 14:29:18 -08:00
Prashant Sharma	deefd9d737	SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable Author: Prashant Sharma <prashant.s@imaginea.com> Closes #771 from ScrapCodes/deploy-failover-pluggable and squashes the following commits: 29ba440 [Prashant Sharma] fixed a compilation error fef35ec [Prashant Sharma] Code review 57ee6f0 [Prashant Sharma] SPARK-1830 Deploy failover, Make Persistence engine and LeaderAgent Pluggable.	2014-11-11 09:29:48 -08:00
Reynold Xin	ef29a9a9aa	[SPARK-4307] Initialize FileDescriptor lazily in FileRegion. Netty's DefaultFileRegion requires a FileDescriptor in its constructor, which means we need to have a opened file handle. In super large workloads, this could lead to too many open files due to the way these file descriptors are cleaned. This pull request creates a new LazyFileRegion that initializes the FileDescriptor when we are sending data for the first time. Author: Reynold Xin <rxin@databricks.com> Author: Reynold Xin <rxin@apache.org> Closes #3172 from rxin/lazyFD and squashes the following commits: 0bdcdc6 [Reynold Xin] Added reference to Netty's DefaultFileRegion d4564ae [Reynold Xin] Added SparkConf to the ctor argument of IndexShuffleBlockManager. 6ed369e [Reynold Xin] Code review feedback. 04cddc8 [Reynold Xin] [SPARK-4307] Initialize FileDescriptor lazily in FileRegion.	2014-11-11 00:25:31 -08:00
Patrick Wendell	6e7a309b81	Revert "[SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a Tachyon system locally." This reverts commit `bd86cb1738`.	2014-11-10 14:56:06 -08:00
Niklas Wilcke	ed8bf1eac5	[SPARK-4169] [Core] Accommodate non-English Locales in unit tests For me the core tests failed because there are two locale dependent parts in the code. Look at the Jira ticket for details. Why is it necessary to check the exception message in isBindCollision in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1686 ? Author: Niklas Wilcke <1wilcke@informatik.uni-hamburg.de> Closes #3036 from numbnut/core-test-fix and squashes the following commits: 1fb0d04 [Niklas Wilcke] Fixing locale dependend code and tests	2014-11-10 11:37:38 -08:00
RongGu	bd86cb1738	[SPARK-2703][Core]Make Tachyon related unit tests execute without deploying a Tachyon system locally. Make Tachyon related unit tests execute without deploying a Tachyon system locally. Author: RongGu <gurongwalker@gmail.com> Closes #3030 from RongGu/SPARK-2703 and squashes the following commits: ad08827 [RongGu] Make Tachyon related unit tests execute without deploying a Tachyon system locally	2014-11-09 23:48:15 -08:00
Sandy Ryza	3c2cff4b94	SPARK-3179. Add task OutputMetrics. Author: Sandy Ryza <sandy@cloudera.com> This patch had conflicts when merged, resolved by Committer: Kay Ousterhout <kayousterhout@gmail.com> Closes #2968 from sryza/sandy-spark-3179 and squashes the following commits: dce4784 [Sandy Ryza] More review feedback 8d350d1 [Sandy Ryza] Fix test against Hadoop 2.5+ e7c74d0 [Sandy Ryza] More review feedback 6cff9c4 [Sandy Ryza] Review feedback fb2dde0 [Sandy Ryza] SPARK-3179	2014-11-09 22:29:03 -08:00
Sean Owen	f8e5732307	SPARK-1209 [CORE] (Take 2) SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop andrewor14 Another try at SPARK-1209, to address https://github.com/apache/spark/pull/2814#issuecomment-61197619 I successfully tested with `mvn -Dhadoop.version=1.0.4 -DskipTests clean package; mvn -Dhadoop.version=1.0.4 test` I assume that is what failed Jenkins last time. I also tried `-Dhadoop.version1.2.1` and `-Phadoop-2.4 -Pyarn -Phive` for more coverage. So this is why the class was put in `org.apache.hadoop` to begin with, I assume. One option is to leave this as-is for now and move it only when Hadoop 1.0.x support goes away. This is the other option, which adds a call to force the constructor to be public at run-time. It's probably less surprising than putting Spark code in `org.apache.hadoop`, but, does involve reflection. A `SecurityManager` might forbid this, but it would forbid a lot of stuff Spark does. This would also only affect Hadoop 1.0.x it seems. Author: Sean Owen <sowen@cloudera.com> Closes #3048 from srowen/SPARK-1209 and squashes the following commits: 0d48f4b [Sean Owen] For Hadoop 1.0.x, make certain constructors public, which were public in later versions 466e179 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though? eb61820 [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark	2014-11-09 22:11:20 -08:00
Sean Owen	d1362659ef	SPARK-1344 [DOCS] Scala API docs for top methods Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs. I think this resolves the discussion in SPARK-1344. Author: Sean Owen <sowen@cloudera.com> Closes #3168 from srowen/SPARK-1344 and squashes the following commits: 6963fcc [Sean Owen] Use "k" in javadoc of top and takeOrdered to avoid confusion with type K in pair RDDs	2014-11-09 17:42:08 -08:00
Cheng Lian	86e9eaa3f0	[SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark version This PR resorts to `SparkContext.version` rather than META-INF/MANIFEST.MF in the assembly jar to inspect Spark version. Currently, when built with Maven, the MANIFEST.MF file in the assembly jar is incorrectly replaced by Guava 15.0 MANIFEST.MF, probably because of the assembly/shading tricks. Another related PR is #3103, which tries to fix the MANIFEST issue. Author: Cheng Lian <lian@databricks.com> Closes #3105 from liancheng/spark-4225 and squashes the following commits: d9585e1 [Cheng Lian] Resorts to SparkContext.version to inspect Spark version	2014-11-07 11:45:25 -08:00
Aaron Davidson	d4fa04e50d	[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1). This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time. Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient. Author: Aaron Davidson <aaron@databricks.com> Closes #3146 from aarondav/free and squashes the following commits: ed1102a [Aaron Davidson] Remove some unused imports b8e2a49 [Aaron Davidson] Add appId to test 538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages	2014-11-07 09:42:21 -08:00
zsxwing	3abdb1b24a	[SPARK-4204][Core][WebUI] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly This PR fixed `Utils.exceptionString` to output the full exception information. However, the stack trace may become very huge, so I also updated the Web UI to collapse the error information by default (display the first line and clicking `+detail` will display the full info). Here are the screenshots: Stages: ![stages](https://cloud.githubusercontent.com/assets/1000778/4882441/66d8cc68-6356-11e4-8346-6318677d9470.png) Details for one stage: ![stage](https://cloud.githubusercontent.com/assets/1000778/4882513/1311043c-6357-11e4-8804-ca14240a9145.png) The full information in the gray text field is: ```Java org.apache.spark.shuffle.FetchFailedException: Connection reset by peer at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:166) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ... 1 more ``` /cc aarondav Author: zsxwing <zsxwing@gmail.com> Closes #3073 from zsxwing/SPARK-4204 and squashes the following commits: 176d1e3 [zsxwing] Add comments to explain the stack trace difference ca509d3 [zsxwing] Add fullStackTrace to the constructor of ExceptionFailure a07057b [zsxwing] Core style fix dfb0032 [zsxwing] Backward compatibility for old history server 1e50f71 [zsxwing] Update as per review and increase the max height of the stack trace details 94f2566 [zsxwing] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly	2014-11-06 21:52:38 -08:00
Aaron Davidson	48a19a6dba	[SPARK-4236] Cleanup removed applications' files in shuffle service This relies on a hook from whoever is hosting the shuffle service to invoke removeApplication() when the application is completed. Once invoked, we will clean up all the executors' shuffle directories we know about. Author: Aaron Davidson <aaron@databricks.com> Closes #3126 from aarondav/cleanup and squashes the following commits: 33a64a9 [Aaron Davidson] Missing brace e6e428f [Aaron Davidson] Address comments 16a0d27 [Aaron Davidson] Cleanup e4df3e7 [Aaron Davidson] [SPARK-4236] Cleanup removed applications' files in shuffle service	2014-11-06 19:54:32 -08:00
Aaron Davidson	f165b2bbf5	[SPARK-4188] [Core] Perform network-level retry of shuffle file fetches This adds a RetryingBlockFetcher to the NettyBlockTransferService which is wrapped around our typical OneForOneBlockFetcher, adding retry logic in the event of an IOException. This sort of retry allows us to avoid marking an entire executor as failed due to garbage collection or high network load. TODO: - [x] unit tests - [x] put in ExternalShuffleClient too Author: Aaron Davidson <aaron@databricks.com> Closes #3101 from aarondav/retry and squashes the following commits: 72a2a32 [Aaron Davidson] Add that we should remove the condition around the retry thingy c7fd107 [Aaron Davidson] Fix unit tests e80e4c2 [Aaron Davidson] Address initial comments 6f594cd [Aaron Davidson] Fix unit test 05ff43c [Aaron Davidson] Add to external shuffle client and add unit test 66e5a24 [Aaron Davidson] [SPARK-4238] [Core] Perform network-level retry of shuffle file fetches	2014-11-06 18:39:14 -08:00
Aaron Davidson	6e9ef10fd7	[SPARK-4277] Support external shuffle service on Standalone Worker Author: Aaron Davidson <aaron@databricks.com> Closes #3142 from aarondav/worker and squashes the following commits: 3780bd7 [Aaron Davidson] Address comments 2dcdfc1 [Aaron Davidson] Add private[worker] 47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors) 258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor	2014-11-06 17:20:46 -08:00
Aaron Davidson	23eaf0e12f	[SPARK-4264] Completion iterator should only invoke callback once Author: Aaron Davidson <aaron@databricks.com> Closes #3128 from aarondav/compiter and squashes the following commits: 698e4be [Aaron Davidson] [SPARK-4264] Completion iterator should only invoke callback once	2014-11-06 10:45:46 -08:00
Davies Liu	b41a39e240	[SPARK-4186] add binaryFiles and binaryRecords in Python add binaryFiles() and binaryRecords() in Python ``` binaryFiles(self, path, minPartitions=None): :: Developer API :: Read a directory of binary files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI as a byte array. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file. Note: Small files are preferred, large file is also allowable, but may cause bad performance. binaryRecords(self, path, recordLength): Load data from a flat binary file, assuming each record is a set of numbers with the specified numerical format (see ByteBuffer), and the number of bytes per record is constant. :param path: Directory to the input data files :param recordLength: The length at which to split the records ``` Author: Davies Liu <davies@databricks.com> Closes #3078 from davies/binary and squashes the following commits: cd0bdbd [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary 3aa349b [Davies Liu] add experimental notes 24e84b6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary 5ceaa8a [Davies Liu] Merge branch 'master' of github.com:apache/spark into binary 1900085 [Davies Liu] bugfix bb22442 [Davies Liu] add binaryFiles and binaryRecords in Python	2014-11-06 00:22:19 -08:00
Kay Ousterhout	5f27ae16d5	[SPARK-4255] Fix incorrect table striping This commit stripes table rows after hiding some rows, to ensure that rows are correct striped to alternate white and grey even when rows are hidden by default. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3117 from kayousterhout/striping and squashes the following commits: be6e10a [Kay Ousterhout] [SPARK-4255] Fix incorrect table striping	2014-11-06 00:03:03 -08:00
Brenden Matthews	cb0eae3b78	[SPARK-4158] Fix for missing resources. Mesos offers may not contain all resources, and Spark needs to check to ensure they are present and sufficient. Spark may throw an erroneous exception when resources aren't present. Author: Brenden Matthews <brenden@diddyinc.com> Closes #3024 from brndnmtthws/fix-mesos-resource-misuse and squashes the following commits: e5f9580 [Brenden Matthews] [SPARK-4158] Fix for missing resources.	2014-11-05 16:02:44 -08:00
Jongyoul Lee	f7ac8c2b1d	SPARK-3223 runAsSparkUser cannot change HDFS write permission properly i... ...n mesos cluster mode - change master newer Author: Jongyoul Lee <jongyoul@gmail.com> Closes #3034 from jongyoul/SPARK-3223 and squashes the following commits: 42b2ed3 [Jongyoul Lee] SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode - change master newer	2014-11-05 15:49:42 -08:00
Andrew Or	61a5cced04	[SPARK-3797] Run external shuffle service in Yarn NM This creates a new module `network/yarn` that depends on `network/shuffle` recently created in #3001. This PR introduces a custom Yarn auxiliary service that runs the external shuffle service. As of the changes here this shuffle service is required for using dynamic allocation with Spark. This is still WIP mainly because it doesn't handle security yet. I have tested this on a stable Yarn cluster. Author: Andrew Or <andrew@databricks.com> Closes #3082 from andrewor14/yarn-shuffle-service and squashes the following commits: ef3ddae [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service 0ee67a2 [Andrew Or] Minor wording suggestions 1c66046 [Andrew Or] Remove unused provided dependencies 0eb6233 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service 6489db5 [Andrew Or] Try catch at the right places 7b71d8f [Andrew Or] Add detailed java docs + reword a few comments d1124e4 [Andrew Or] Add security to shuffle service (INCOMPLETE) 5f8a96f [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service 9b6e058 [Andrew Or] Address various feedback f48b20c [Andrew Or] Fix tests again f39daa6 [Andrew Or] Do not make network-yarn an assembly module 761f58a [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-shuffle-service 15a5b37 [Andrew Or] Fix build for Hadoop 1.x baff916 [Andrew Or] Fix tests 5bf9b7e [Andrew Or] Address a few minor comments 5b419b8 [Andrew Or] Add missing license header 804e7ff [Andrew Or] Include the Yarn shuffle service jar in the distribution cd076a4 [Andrew Or] Require external shuffle service for dynamic allocation ea764e0 [Andrew Or] Connect to Yarn shuffle service only if it's enabled 1bf5109 [Andrew Or] Use the shuffle service port specified through hadoop config b4b1f0c [Andrew Or] 4 tabs -> 2 tabs 43dcb96 [Andrew Or] First cut integration of shuffle service with Yarn aux service b54a0c4 [Andrew Or] Initial skeleton for Yarn shuffle service	2014-11-05 15:42:05 -08:00
industrial-sloth	f37817b18a	SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader replaces the existing read() call with readFully(). Author: industrial-sloth <industrial-sloth@users.noreply.github.com> Closes #3093 from industrial-sloth/branch-1.2-fixedLenRecRdr and squashes the following commits: a245c8a [industrial-sloth] use readFully in FixedLengthBinaryRecordReader (cherry picked from commit `6844e7a821`) Signed-off-by: Matei Zaharia <matei@databricks.com>	2014-11-05 15:39:16 -08:00
Kay Ousterhout	a46497eecc	[SPARK-3984] [SPARK-3983] Fix incorrect scheduler delay and display task deserialization time in UI This commit fixes the scheduler delay in the UI (which previously included things that are not scheduler delay, like time to deserialize the task and serialize the result), and also adds information about time to deserialize tasks to the optional additional metrics. Time to deserialize the task can be large relative to task time for short jobs, and understanding when it is high can help developers realize that they should try to reduce closure size (e.g, by including less data in the task description). cc shivaram etrain Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #2832 from kayousterhout/SPARK-3983 and squashes the following commits: 0c1398e [Kay Ousterhout] Fixed ordering 531575d [Kay Ousterhout] Removed executor launch time 1f13afe [Kay Ousterhout] Minor spacing fixes 335be4b [Kay Ousterhout] Made metrics hideable 5bc3cba [Kay Ousterhout] [SPARK-3984] [SPARK-3983] Improve UI task metrics.	2014-11-05 15:30:31 -08:00
Aaron Davidson	4c42986cc0	[SPARK-4242] [Core] Add SASL to external shuffle service Does three things: (1) Adds SASL to ExternalShuffleClient, (2) puts SecurityManager in BlockManager's constructor, and (3) adds unit test. Author: Aaron Davidson <aaron@databricks.com> Closes #3108 from aarondav/sasl-client and squashes the following commits: 48b622d [Aaron Davidson] Screw it, let's just get LimitedInputStream 3543b70 [Aaron Davidson] Back out of pom change due to unknown test issue? b58518a [Aaron Davidson] ByteStreams.limit() not available :( cbe451a [Aaron Davidson] Address comments 2bf2908 [Aaron Davidson] [SPARK-4242] [Core] Add SASL to external shuffle service	2014-11-05 14:38:43 -08:00
Aaron Davidson	5e73138a01	[SPARK-2938] Support SASL authentication in NettyBlockTransferService Also lays the groundwork for supporting it inside the external shuffle service. Author: Aaron Davidson <aaron@databricks.com> Closes #3087 from aarondav/sasl and squashes the following commits: 3481718 [Aaron Davidson] Delete rogue println 44f8410 [Aaron Davidson] Delete documentation - muahaha! eb9f065 [Aaron Davidson] Improve documentation and add end-to-end test at Spark-level a6b95f1 [Aaron Davidson] Address comments 785bbde [Aaron Davidson] Cleanup 79973cb [Aaron Davidson] Remove unused file 151b3c5 [Aaron Davidson] Add docs, timeout config, better failure handling f6177d7 [Aaron Davidson] Cleanup SASL state upon connection termination 7b42adb [Aaron Davidson] Add unit tests 8191bcb [Aaron Davidson] [SPARK-2938] Support SASL authentication in NettyBlockTransferService	2014-11-04 16:15:38 -08:00
Davies Liu	e4f42631a6	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default. This PR simplify serializer, always use batched serializer (AutoBatchedSerializer as default), even batch size is 1. Author: Davies Liu <davies@databricks.com> This patch had conflicts when merged, resolved by Committer: Josh Rosen <joshrosen@databricks.com> Closes #2920 from davies/fix_autobatch and squashes the following commits: e544ef9 [Davies Liu] revert unrelated change 6880b14 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_autobatch 1d557fc [Davies Liu] fix tests 8180907 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_autobatch 76abdce [Davies Liu] clean up 53fa60b [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_autobatch d7ac751 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_autobatch 2cc2497 [Davies Liu] Merge branch 'master' of github.com:apache/spark into fix_autobatch b4292ce [Davies Liu] fix bug in master d79744c [Davies Liu] recover hive tests be37ece [Davies Liu] refactor eb3938d [Davies Liu] refactor serializer in scala 8d77ef2 [Davies Liu] simplify serializer, use AutoBatchedSerializer by default.	2014-11-03 23:56:14 -08:00
zsxwing	b671ce047d	[SPARK-4166][Core] Add a backward compatibility test for ExecutorLostFailure Author: zsxwing <zsxwing@gmail.com> Closes #3085 from zsxwing/SPARK-4166-back-comp and squashes the following commits: 89329f4 [zsxwing] Add a backward compatibility test for ExecutorLostFailure	2014-11-03 22:47:45 -08:00
zsxwing	9bdc8412a0	[SPARK-4163][Core] Add a backward compatibility test for FetchFailed /cc aarondav Author: zsxwing <zsxwing@gmail.com> Closes #3086 from zsxwing/SPARK-4163-back-comp and squashes the following commits: 21cb2a8 [zsxwing] Add a backward compatibility test for FetchFailed	2014-11-03 22:40:43 -08:00
Josh Rosen	4f035dd2cd	[SPARK-611] Display executor thread dumps in web UI This patch allows executor thread dumps to be collected on-demand and viewed in the Spark web UI. The thread dumps are collected using Thread.getAllStackTraces(). To allow remote thread dumps to be triggered from the web UI, I added a new `ExecutorActor` that runs inside of the Executor actor system and responds to RPCs from the driver. The driver's mechanism for obtaining a reference to this actor is a little bit hacky: it uses the block manager master actor to determine the host/port of the executor actor systems in order to construct ActorRefs to ExecutorActor. Unfortunately, I couldn't find a much cleaner way to do this without a big refactoring of the executor -> driver communication. Screenshots: ![image](https://cloud.githubusercontent.com/assets/50748/4781793/7e7a0776-5cbf-11e4-874d-a91cd04620bd.png) ![image](https://cloud.githubusercontent.com/assets/50748/4781794/8bce76aa-5cbf-11e4-8d13-8477748c9f7e.png) ![image](https://cloud.githubusercontent.com/assets/50748/4781797/bd11a8b8-5cbf-11e4-9ad7-a7459467ec8e.png) Author: Josh Rosen <joshrosen@databricks.com> Closes #2944 from JoshRosen/jstack-in-web-ui and squashes the following commits: 3c21a5d [Josh Rosen] Address review comments: 880f7f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into jstack-in-web-ui f719266 [Josh Rosen] Merge remote-tracking branch 'origin/master' into jstack-in-web-ui 19707b0 [Josh Rosen] Add one comment. 127a130 [Josh Rosen] Update to use SparkContext.DRIVER_IDENTIFIER b8e69aa [Josh Rosen] Merge remote-tracking branch 'origin/master' into jstack-in-web-ui 3dfc2d4 [Josh Rosen] Add missing file. bc1e675 [Josh Rosen] Undo some leftover changes from the earlier approach. f4ac1c1 [Josh Rosen] Switch to on-demand collection of thread dumps dfec08b [Josh Rosen] Add option to disable thread dumps in UI. 4c87d7f [Josh Rosen] Use separate RPC for sending thread dumps. 2b8bdf3 [Josh Rosen] Enable thread dumps from the driver when running in non-local mode. cc3e6b3 [Josh Rosen] Fix test code in DAGSchedulerSuite. 87b8b65 [Josh Rosen] Add new listener event for thread dumps. 8c10216 [Josh Rosen] Add missing file. 0f198ac [Josh Rosen] [SPARK-611] Display executor thread dumps in web UI	2014-11-03 18:18:47 -08:00
Zhang, Liye	97a466eca0	[SPARK-4168][WebUI] web statges number should show correctly when stages are more than 1000 The number of completed stages and failed stages showed on webUI will always be less than 1000. This is really misleading when there are already thousands of stages completed or failed. The number should be correct even when only partial stages listed on the webUI (stage info will be removed if the number is too large). Author: Zhang, Liye <liye.zhang@intel.com> Closes #3035 from liyezhang556520/webStageNum and squashes the following commits: d9e29fb [Zhang, Liye] add detailed comments for variables 4ea8fd1 [Zhang, Liye] change variable name accroding to comments f4c404d [Zhang, Liye] [SPARK-4168][WebUI] web statges number should show correctly when stages are more than 1000	2014-11-03 18:17:32 -08:00
Sandy Ryza	28128150e7	SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader insta... ...ntiation Author: Sandy Ryza <sandy@cloudera.com> Closes #3045 from sryza/sandy-spark-4178 and squashes the following commits: 8d2e70e [Sandy Ryza] Kostas's review feedback e5b27c0 [Sandy Ryza] SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader instantiation	2014-11-03 15:19:01 -08:00
zsxwing	76386e1a23	[SPARK-4163][Core][WebUI] Send the fetch failure message back to Web UI This is a PR to send the fetch failure message back to Web UI. Before: ![f1](https://cloud.githubusercontent.com/assets/1000778/4856595/1f036c80-60be-11e4-956f-335147fbccb7.png) ![f2](https://cloud.githubusercontent.com/assets/1000778/4856596/1f11cbea-60be-11e4-8fe9-9f9b2b35c884.png) After (Please ignore the meaning of exception, I threw it in the code directly because it's hard to simulate a fetch failure): ![e1](https://cloud.githubusercontent.com/assets/1000778/4856600/2657ea38-60be-11e4-9f2d-d56c5f900f10.png) ![e2](https://cloud.githubusercontent.com/assets/1000778/4856601/26595008-60be-11e4-912b-2744af786991.png) Author: zsxwing <zsxwing@gmail.com> Closes #3032 from zsxwing/SPARK-4163 and squashes the following commits: f7e1faf [zsxwing] Discard changes for FetchFailedException and minor modification 4e946f7 [zsxwing] Add e as the cause of SparkException 316767d [zsxwing] Add private[storage] to FetchResult d51b0b6 [zsxwing] Set e as the cause of FetchFailedException b88c919 [zsxwing] Use 'private[storage]' for case classes instead of 'sealed' 62103fd [zsxwing] Update as per review 0c07d1f [zsxwing] Backward-compatible support a3bca65 [zsxwing] Send the fetch failure message back to Web UI	2014-11-02 23:20:22 -08:00
Aaron Davidson	2ebd1df3f1	[SPARK-4183] Close transport-related resources between SparkContexts A leak of event loops may be causing test failures. Author: Aaron Davidson <aaron@databricks.com> Closes #3053 from aarondav/leak and squashes the following commits: e676d18 [Aaron Davidson] Typo! 8f96475 [Aaron Davidson] Keep original ssc semantics 7e49f10 [Aaron Davidson] A leak of event loops may be causing test failures.	2014-11-02 16:26:24 -08:00
Cheng Lian	c9f840046f	[SPARK-3791][SQL] Provides Spark version and Hive version in HiveThriftServer2 This PR overrides the `GetInfo` Hive Thrift API to provide correct version information. Another property `spark.sql.hive.version` is added to reveal the underlying Hive version. These are generally useful for Spark SQL ODBC driver providers. The Spark version information is extracted from the jar manifest. Also took the chance to remove the `SET -v` hack, which was a workaround for Simba ODBC driver connectivity. TODO - [x] Find a general way to figure out Hive (or even any dependency) version. This [blog post](http://blog.soebes.de/blog/2014/01/02/version-information-into-your-appas-with-maven/) suggests several methods to inspect application version. In the case of Spark, this can be tricky because the chosen method: 1. must applies to both Maven build and SBT build For Maven builds, we can retrieve the version information from the META-INF/maven directory within the assembly jar. But this doesn't work for SBT builds. 2. must not rely on the original jars of dependencies to extract specific dependency version, because Spark uses assembly jar. This implies we can't read Hive version from Hive jar files since standard Spark distribution doesn't include them. 3. should play well with `SPARK_PREPEND_CLASSES` to ease local testing during development. `SPARK_PREPEND_CLASSES` prevents classes to be loaded from the assembly jar, thus we can't locate the jar file and read its manifest. Given these, maybe the only reliable method is to generate a source file containing version information at build time. pwendell Do you have any suggestions from the perspective of the build process? Update Hive version is now retrieved from the newly introduced `HiveShim` object. Author: Cheng Lian <lian.cs.zju@gmail.com> Author: Cheng Lian <lian@databricks.com> Closes #2843 from liancheng/get-info and squashes the following commits: a873d0f [Cheng Lian] Updates test case 53f43cd [Cheng Lian] Retrieves underlying Hive verson via HiveShim 1d282b8 [Cheng Lian] Removes the Simba ODBC "SET -v" hack f857fce [Cheng Lian] Overrides Hive GetInfo Thrift API and adds Hive version property	2014-11-02 15:18:29 -08:00
zsxwing	4e6a7a0b3e	[SPARK-4166][Core][WebUI] Display the executor ID in the Web UI when ExecutorLostFailure happens Now when ExecutorLostFailure happens, it only displays `ExecutorLostFailure (executor lost)`. Adding the executor id will help locate the faulted executor. Author: zsxwing <zsxwing@gmail.com> Closes #3033 from zsxwing/SPARK-4166 and squashes the following commits: ff4664c [zsxwing] Backward-compatible support c5c4cf2 [zsxwing] Display the executor ID in the Web UI when ExecutorLostFailure happens	2014-11-02 10:44:52 -08:00
Davies Liu	6181577e99	[SPARK-3466] Limit size of results that a driver collects for each action Right now, operations like collect() and take() can crash the driver with an OOM if they bring back too many data. This PR will introduce spark.driver.maxResultSize, after setting it, the driver will abort a job if its result is bigger than it. By default, it's 1g (for backward compatibility for most the cases). In local mode, the driver and executor share the same JVM, the default setting can not protect JVM from OOM. cc mateiz Author: Davies Liu <davies@databricks.com> Closes #3003 from davies/collect and squashes the following commits: 248ed5e [Davies Liu] fix compile 272522e [Davies Liu] address comments 2c35773 [Davies Liu] add sizes in message of abort() 5d62303 [Davies Liu] address comments bc3c077 [Davies Liu] Merge branch 'master' of github.com:apache/spark into collect 11f97c5 [Davies Liu] address comments 47b144f [Davies Liu] check the size of result before send and fetch 3d81af2 [Davies Liu] address comments ca8267d [Davies Liu] limit the size of data by collect	2014-11-02 00:03:51 -07:00
Patrick Wendell	7894de276b	Revert "[SPARK-4183] Enable NettyBlockTransferService by default" This reverts commit `59e626c701`.	2014-11-01 15:18:58 -07:00
Aaron Davidson	f55218aeb1	[SPARK-3796] Create external service which can serve shuffle files This patch introduces the tooling necessary to construct an external shuffle service which is independent of Spark executors, and then use this service inside Spark. An example (just for the sake of this PR) of the service creation can be found in Worker, and the service itself is used by plugging in the StandaloneShuffleClient as Spark's ShuffleClient (setup in BlockManager). This PR continues the work from #2753, which extracted out the transport layer of Spark's block transfer into an independent package within Spark. A new package was created which contains the Spark business logic necessary to retrieve the actual shuffle data, which is completely independent of the transport layer introduced in the previous patch. Similar to the transport layer, this package must not depend on Spark as we anticipate plugging this service as a lightweight process within, say, the YARN NodeManager, and do not wish to include Spark's dependencies (including Scala itself). There are several outstanding tasks which must be complete before this PR can be merged: - [x] Complete unit testing of network/shuffle package. - [x] Performance and correctness testing on a real cluster. - [x] Remove example service instantiation from Worker.scala. There are even more shortcomings of this PR which should be addressed in followup patches: - Don't use Java serializer for RPC layer! It is not cross-version compatible. - Handle shuffle file cleanup for dead executors once the application terminates or the ContextCleaner triggers. - Documentation of the feature in the Spark docs. - Improve behavior if the shuffle service itself goes down (right now we don't blacklist it, and new executors cannot spawn on that machine). - SSL and SASL integration - Nice to have: Handle shuffle file consolidation (this would requires changes to Spark's implementation). Author: Aaron Davidson <aaron@databricks.com> Closes #3001 from aarondav/shuffle-service and squashes the following commits: 4d1f8c1 [Aaron Davidson] Remove changes to Worker 705748f [Aaron Davidson] Rename Standalone* to External* fd3928b [Aaron Davidson] Do not unregister executor outputs unduly 9883918 [Aaron Davidson] Make suggested build changes 3d62679 [Aaron Davidson] Add Spark integration test 7fe51d5 [Aaron Davidson] Fix SBT integration 56caa50 [Aaron Davidson] Address comments c8d1ac3 [Aaron Davidson] Add unit tests 2f70c0c [Aaron Davidson] Fix unit tests 5483e96 [Aaron Davidson] Fix unit tests 46a70bf [Aaron Davidson] Whoops, bracket 5ea4df6 [Aaron Davidson] [SPARK-3796] Create external service which can serve shuffle files	2014-11-01 14:37:45 -07:00
Aaron Davidson	59e626c701	[SPARK-4183] Enable NettyBlockTransferService by default Note that we're turning this on for at least the first part of the QA period as a trial. We want to enable this (and deprecate the NioBlockTransferService) as soon as possible in the hopes that NettyBlockTransferService will be more stable and easier to maintain. We will turn it off if we run into major issues. Author: Aaron Davidson <aaron@databricks.com> Closes #3049 from aarondav/enable-netty and squashes the following commits: bb981cc [Aaron Davidson] [SPARK-4183] Enable NettyBlockTransferService by default	2014-11-01 13:15:24 -07:00
Kevin Mader	7136719b7d	[SPARK-2759][CORE] Generic Binary File Support in Spark The additions add the abstract BinaryFileInputFormat and BinaryRecordReader classes for reading in data as a byte stream and converting it to another format using the ```def parseByteArray(inArray: Array[Byte]): T``` function. As a trivial example ```ByteInputFormat``` and ```ByteRecordReader``` are included which just return the Array[Byte] from a given file. Finally a RDD for ```BinaryFileInputFormat``` (to allow for easier partitioning changes as was done for WholeFileInput) was added and the appropriate byteFiles to the ```SparkContext``` so the functions can be easily used by others. A common use case might be to read in a folder ``` sc.byteFiles("s3://mydrive/tif/*.tif").map(rawData => ReadTiffFromByteArray(rawData)) ``` Author: Kevin Mader <kevinmader@gmail.com> Author: Kevin Mader <kmader@users.noreply.github.com> Closes #1658 from kmader/master and squashes the following commits: 3c49a30 [Kevin Mader] fixing wholetextfileinput to it has the same setMinPartitions function as in BinaryData files 359a096 [Kevin Mader] making the final corrections suggested by @mateiz and renaming a few functions to make their usage clearer 6379be4 [Kevin Mader] reorganizing code 7b9d181 [Kevin Mader] removing developer API, cleaning up imports 8ac288b [Kevin Mader] fixed a single slightly over 100 character line 92bda0d [Kevin Mader] added new tests, renamed files, fixed several of the javaapi functions, formatted code more nicely a32fef7 [Kevin Mader] removed unneeded classes added DeveloperApi note to portabledatastreams since the implementation might change 49174d9 [Kevin Mader] removed unneeded classes added DeveloperApi note to portabledatastreams since the implementation might change c27a8f1 [Kevin Mader] jenkins crashed before running anything last time, so making minor change b348ce1 [Kevin Mader] fixed order in check (prefix only appears on jenkins not when I run unit tests locally) 0588737 [Kevin Mader] filename check in "binary file input as byte array" test now ignores prefixes and suffixes which might get added by Hadoop 4163e38 [Kevin Mader] fixing line length and output from FSDataInputStream to DataInputStream to minimize sensitivity to Hadoop API changes 19812a8 [Kevin Mader] Fixed the serialization issue with PortableDataStream since neither CombineFileSplit nor TaskAttemptContext implement the Serializable interface, by using ByteArrays for storing both and then recreating the objects from these bytearrays as needed. 238c83c [Kevin Mader] fixed several scala-style issues, changed structure of binaryFiles, removed excessive classes added new tests. The caching tests still have a serialization issue, but that should be easily fixed as well. 932a206 [Kevin Mader] Update RawFileInput.scala a01c9cf [Kevin Mader] Update RawFileInput.scala 441f79a [Kevin Mader] fixed a few small comments and dependency 12e7be1 [Kevin Mader] removing imglib from maven (definitely not ready yet) 5deb79e [Kevin Mader] added new portabledatastream to code so that it can be serialized correctly f032bc0 [Kevin Mader] fixed bug in path name, renamed tests bc5c0b9 [Kevin Mader] made minor stylistic adjustments from mateiz df8e528 [Kevin Mader] fixed line lengths and changed java test 9a313d5 [Kevin Mader] making classes that needn't be public private, adding automatic file closure, adding new tests edf5829 [Kevin Mader] fixing line lengths, adding new lines f4841dc [Kevin Mader] un-optimizing imports, silly intellij eacfaa6 [Kevin Mader] Added FixedLengthBinaryInputFormat and RecordReader from freeman-lab and added them to both the JavaSparkContext and the SparkContext as fixedLengthBinaryFile 1622935 [Kevin Mader] changing the line lengths to make jenkins happy 1cfa38a [Kevin Mader] added apache headers, added datainputstream directly as an output option for more complicated readers (HDF5 perhaps), and renamed several of the functions and files to be more consistent. Also added parallel functions to the java api 84035f1 [Kevin Mader] adding binary and byte file support spark 81c5f12 [Kevin Mader] Merge pull request #1 from apache/master	2014-11-01 11:59:39 -07:00
Sandy Ryza	23f73f525c	SPARK-4175. Exception on stage page Author: Sandy Ryza <sandy@cloudera.com> Closes #3043 from sryza/sandy-spark-4175 and squashes the following commits: e327340 [Sandy Ryza] SPARK-4175. Exception on stage page	2014-10-31 17:22:52 -07:00
Mark Mims	a68ecf3281	[SPARK-4141] Hide Accumulators column on stage page when no accumulators exist WebUI Author: Mark Mims <mark.mims@canonical.com> This patch had conflicts when merged, resolved by Committer: Josh Rosen <joshrosen@databricks.com> Closes #3031 from mmm/remove-accumulators-col and squashes the following commits: 6141cb3 [Mark Mims] reformat to satisfy scalastyle linelength. build failed from jenkins https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22604/ 390893b [Mark Mims] cleanup c28c449 [Mark Mims] looking much better now... minimal explicit formatting. Now, see if any sort keys make sense fb72156 [Mark Mims] mimic hasInput. The basics work here, but wanna clean this up with maybeAccumulators for column content	2014-10-31 11:41:03 -07:00
Kay Ousterhout	adb6415c1d	[SPARK-4016] Allow user to show/hide UI metrics. This commit adds a set of checkboxes to the stage detail page that the user can use to show additional task metrics, including the GC time, result serialization time, result fetch time, and scheduler delay. All of these metrics are now hidden by default. This allows advanced users to look at more detailed metrics, without distracting the average user. This change also cleans up the stage detail page so that metrics are shown in the same order in the summary table as in the task table, and updates the metrics in both tables such that they contain the same set of metrics. The ability to remember a user's preferences for which metrics should be shown has been filed as SPARK-4024. Here's what the stage detail page looks like by default: ![image](https://cloud.githubusercontent.com/assets/1108612/4744322/3ebe319e-5a2f-11e4-891f-c792be79caa2.png) and once a user clicks "Show additional metrics" (note that all the metrics get checked by default): ![image](https://cloud.githubusercontent.com/assets/1108612/4744332/51e5abda-5a2f-11e4-8994-d0d3705ee05d.png) cc shivaram andrewor14 Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #2867 from kayousterhout/SPARK-checkboxes and squashes the following commits: 6015913 [Kay Ousterhout] Added comment 08dee73 [Kay Ousterhout] Josh's usability comments 0940d61 [Kay Ousterhout] Style updates based on Andrew's review ef05ccd [Kay Ousterhout] Added tooltips d7cfaaf [Kay Ousterhout] Made list of add'l metrics collapsible. 70c1fb5 [Kay Ousterhout] [SPARK-4016] Allow user to show/hide UI metrics.	2014-10-31 10:28:19 -07:00
Erik Erlandson	ad3bd0dff8	[SPARK-3250] Implement Gap Sampling optimization for random sampling More efficient sampling, based on Gap Sampling optimization: http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/ Author: Erik Erlandson <eerlands@redhat.com> Closes #2455 from erikerlandson/spark-3250-pr and squashes the following commits: 72496bc [Erik Erlandson] [SPARK-3250] Implement Gap Sampling optimization for random sampling	2014-10-30 22:30:52 -07:00
Patrick Wendell	0734d09320	HOTFIX: Clean up build in network module. This is currently breaking the package build for some people (including me). This patch does some general clean-up which also fixes the current issue. - Uses consistent artifact naming - Adds sbt support for this module - Changes tests to use scalatest (fixes the original issue[1]) One thing to note, it turns out that scalatest when invoked in the Maven build doesn't succesfully detect JUnit Java tests. This is a long standing issue, I noticed it applies to all of our current test suites as well. I've created SPARK-4159 to fix this. [1] The original issue is that we need to allocate extra memory for the tests, happens by default in our scalatest configuration. Author: Patrick Wendell <pwendell@gmail.com> Closes #3025 from pwendell/hotfix and squashes the following commits: faa9053 [Patrick Wendell] HOTFIX: Clean up build in network module.	2014-10-30 20:15:36 -07:00
Andrew Or	26d31d15fd	Revert "SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop" This reverts commit `68cb69daf3`.	2014-10-30 17:56:10 -07:00
Sean Owen	68cb69daf3	SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce}Util should not use package org.apache.hadoop (This is just a look at what completely moving the classes would look like. I know Patrick flagged that as maybe not OK, although, it's private?) Author: Sean Owen <sowen@cloudera.com> Closes #2814 from srowen/SPARK-1209 and squashes the following commits: ead1115 [Sean Owen] Disable MIMA warnings resulting from moving the class -- this was also part of the PairRDDFunctions type hierarchy though? 2d42c1d [Sean Owen] Move SparkHadoopMapRedUtil / SparkHadoopMapReduceUtil from org.apache.hadoop to org.apache.spark	2014-10-30 15:54:53 -07:00
Andrew Or	2f54543815	[SPARK-3661] Respect spark..memory in cluster mode This also includes minor re-organization of the code. Tested locally in both client and deploy modes. Author: Andrew Or <andrew@databricks.com> Author: Andrew Or <andrewor14@gmail.com> Closes #2697 from andrewor14/memory-cluster-mode and squashes the following commits: 01d78bc [Andrew Or] Merge branch 'master' of github.com:apache/spark into memory-cluster-mode ccd468b [Andrew Or] Add some comments per Patrick c956577 [Andrew Or] Tweak wording 2b4afa0 [Andrew Or] Unused import 47a5a88 [Andrew Or] Correct Spark properties precedence order bf64717 [Andrew Or] Merge branch 'master' of github.com:apache/spark into memory-cluster-mode dd452d0 [Andrew Or] Respect spark..memory in cluster mode	2014-10-30 15:44:29 -07:00
zsxwing	d345057835	[SPARK-4153][WebUI] Update the sort keys for HistoryPage Sort "Started", "Completed", "Duration" and "Last Updated" by time. Author: zsxwing <zsxwing@gmail.com> Closes #3014 from zsxwing/SPARK-4153 and squashes the following commits: ec8b9ad [zsxwing] Sort "Started", "Completed", "Duration" and "Last Updated" by time	2014-10-30 15:33:56 -07:00
Andrew Or	849b43ec0f	Minor style hot fix after #2711 I had planned to fix this when I merged it but I forgot to. witgo Author: Andrew Or <andrew@databricks.com> Closes #3018 from andrewor14/command-utils-style and squashes the following commits: c2959fb [Andrew Or] Style hot fix	2014-10-30 15:33:34 -07:00
Andrew Or	9334d69967	[SPARK-4155] Consolidate usages of <driver> We use "\<driver\>" everywhere. Let's not do that. Author: Andrew Or <andrew@databricks.com> Closes #3020 from andrewor14/consolidate-driver and squashes the following commits: c1c2204 [Andrew Or] Just use "<driver>" for local executor ID 3d751e9 [Andrew Or] Consolidate usages of <driver>	2014-10-30 15:32:46 -07:00
Andrew Or	5231a3f228	[Minor] A few typos in comments and log messages Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Or <andrew@databricks.com> Closes #3021 from andrewor14/typos and squashes the following commits: daaf417 [Andrew Or] Merge branch 'master' of github.com:apache/spark into typos 4838ae4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into typos 026d426 [Andrew Or] Merge branch 'master' of github.com:andrewor14/spark into typos a81ae8f [Andrew Or] Some typos	2014-10-30 15:32:11 -07:00
Andrew Or	24c5129257	[SPARK-3319] [SPARK-3338] Resolve Spark submit config paths The bulk of this PR is comprised of tests. All changes in functionality are made in `SparkSubmit.scala` (~20 lines). SPARK-3319. There is currently a divergence in behavior when the user passes in additional jars through `--jars` and through setting `spark.jars` in the default properties file. The former will happily resolve the paths (e.g. convert `my.jar` to `file:/absolute/path/to/my.jar`), while the latter does not. We should resolve paths consistently in both cases. This also applies to the following pairs of command line arguments and Spark configs: - `--jars` ~ `spark.jars` - `--files` ~ `spark.files` / `spark.yarn.dist.files` - `--archives` ~ `spark.yarn.dist.archives` - `--py-files` ~ `spark.submit.pyFiles` SPARK-3338. This PR also fixes the following bug: if the user sets `spark.submit.pyFiles` in his/her properties file, it does not actually get picked up even if `--py-files` is not set. This is simply because the config is overridden by an empty string. Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Or <andrew@databricks.com> Closes #2232 from andrewor14/resolve-config-paths and squashes the following commits: fff2869 [Andrew Or] Add spark.yarn.jar da3a1c1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into resolve-config-paths f0fae64 [Andrew Or] Merge branch 'master' of github.com:apache/spark into resolve-config-paths 05e03d6 [Andrew Or] Add tests for resolving both command line and config paths 460117e [Andrew Or] Resolve config paths properly fe039d3 [Andrew Or] Beef up tests to test fixed-pointed-ness of Utils.resolveURI(s)	2014-10-30 15:29:07 -07:00
Grace	9142c9b80b	[SPARK-4078] New FsPermission instance w/o FsPermission.createImmutable in eventlog By default, Spark builds its package against Hadoop 1.0.4 version. In that version, it has some FsPermission bug (see [HADOOP-7629] (https://issues.apache.org/jira/browse/HADOOP-7629) by Todd Lipcon). This bug got fixed since 1.1 version. By using that FsPermission.createImmutable() API, end-user may see some RPC exception like below (if turn on eventlog over HDFS). Here proposes a quick fix to avoid certain exception for all hadoop versions. ``` Exception in thread "main" java.io.IOException: Call to sr484/10.1.2.84:54310 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:1150) at org.apache.hadoop.ipc.Client.call(Client.java:1118) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy6.setPermission(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at $Proxy6.setPermission(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1285) at org.apache.hadoop.hdfs.DistributedFileSystem.setPermission(DistributedFileSystem.java:572) at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:138) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.<init>(SparkContext.scala:324) ``` Author: Grace <jie.huang@intel.com> Closes #2892 from GraceH/eventlog-rpc and squashes the following commits: 58ea038 [Grace] new FsPermission Instance w/o FsPermission.createImmutable	2014-10-30 15:27:32 -07:00
Tathagata Das	fb1fbca204	[SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS As part of the initiative of preventing data loss on streaming driver failure, this sub-task implements a BlockRDD that is backed by HDFS. This BlockRDD can either read data from the Spark's BlockManager, or read the data from file-segments in write ahead log in HDFS. Most of this code has been written by @harishreedharan Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Hari Shreedharan <hshreedharan@apache.org> Closes #2931 from tdas/driver-ha-rdd and squashes the following commits: 209e49c [Tathagata Das] Better fix to style issue. 4a5866f [Tathagata Das] Addressed one more comment. ed5fbf0 [Tathagata Das] Minor updates. b0a18b1 [Tathagata Das] Fixed import order. 20aa7c6 [Tathagata Das] Fixed more line length issues. 29aa099 [Tathagata Das] Fixed line length issues. 9e47b5b [Tathagata Das] Renamed class, simplified+added unit tests. 6e1bfb8 [Tathagata Das] Tweaks testuite to create spark contxt lazily to prevent contxt leaks. 9c86a61 [Tathagata Das] Merge pull request #22 from harishreedharan/driver-ha-rdd 2878c38 [Hari Shreedharan] Shutdown spark context after tests. Formatting/minor fixes c709f2f [Tathagata Das] Merge pull request #21 from harishreedharan/driver-ha-rdd 5cce16f [Hari Shreedharan] Make sure getBlockLocations uses offset and length to find the blocks on HDFS eadde56 [Tathagata Das] Transferred HDFSBackedBlockRDD for the driver-ha-working branch	2014-10-30 15:17:02 -07:00
Kay Ousterhout	6db3157464	[SPARK-4102] Remove unused ShuffleReader.stop() method. This method is not implemented by the only subclass (HashShuffleReader), nor is it ever called. While the use of Scala's fancy "???" was pretty exciting, the method's existence can only lead to confusion and it therefore should be deleted. mateiz was there a reason for adding this that I'm missing? Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #2966 from kayousterhout/SPARK-4102 and squashes the following commits: 532c564 [Kay Ousterhout] Added back commented-out method, as per Matei's request 904655e [Kay Ousterhout] [SPARK-4102] Remove unused ShuffleReader.stop() method.	2014-10-29 23:52:46 -07:00
GuoQiang Li	cd739bd756	[SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH instead of -Djava.library.path - [X] Standalone - [X] YARN - [X] Mesos - [X] Mac OS X - [X] Linux - [ ] Windows This is another implementation about #1031 Author: GuoQiang Li <witgo@qq.com> Closes #2711 from witgo/SPARK-1719 and squashes the following commits: c7b26f6 [GuoQiang Li] review commits 4488e41 [GuoQiang Li] Refactoring CommandUtils a444094 [GuoQiang Li] review commits 40c0b4a [GuoQiang Li] Add buildLocalCommand method c1a0ddd [GuoQiang Li] fix comments 156ce88 [GuoQiang Li] review commit 38aa377 [GuoQiang Li] Refactor CommandUtils.scala 4269e00 [GuoQiang Li] Refactor SparkSubmitDriverBootstrapper.scala 7a1d634 [GuoQiang Li] use LD_LIBRARY_PATH instead of -Djava.library.path	2014-10-29 23:02:58 -07:00
Andrew Or	8d59b37b02	[SPARK-3795] Heuristics for dynamically scaling executors This is part of a bigger effort to provide elastic scaling of executors within a Spark application ([SPARK-3174](https://issues.apache.org/jira/browse/SPARK-3174)). This PR does not provide any functionality by itself; it is a skeleton that is missing a mechanism to be added later in [SPARK-3822](https://issues.apache.org/jira/browse/SPARK-3822). Comments and feedback are most welcome. For those of you reviewing this in detail, I highly recommend doing it through your favorite IDE instead of through the diff here. Author: Andrew Or <andrewor14@gmail.com> Author: Andrew Or <andrew@databricks.com> Closes #2746 from andrewor14/scaling-heuristics and squashes the following commits: 8a4fdaa [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics e045df8 [Andrew Or] Add warning message (minor) dfa31ec [Andrew Or] Fix tests c0becc4 [Andrew Or] Merging with SPARK-3822 4784f93 [Andrew Or] Reword an awkward log message 181f27f [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics c79e907 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 4672b90 [Andrew Or] It's nano time. a6a30f2 [Andrew Or] Do not allow min/max executors of 0 c60ec33 [Andrew Or] Rewrite test logic with clocks b00b680 [Andrew Or] Fix style c3caa65 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 7f9da14 [Andrew Or] Factor out logic to verify bounds on # executors (minor) f279019 [Andrew Or] Add time mocking tests for polling loop 685e347 [Andrew Or] Factor out clock in polling loop to facilitate testing 3cea7f7 [Andrew Or] Use PrivateMethodTester to keep original class private 3156d81 [Andrew Or] Update comments and exception messages 92f36f9 [Andrew Or] Address minor review comments abdea61 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 2aefd09 [Andrew Or] Correct listener behavior 9fe6e44 [Andrew Or] Rename variables and configs + update comments and log messages 149cc32 [Andrew Or] Fix style 254c958 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 5ff829b [Andrew Or] Add tests for ExecutorAllocationManager 19c6c4b [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 5896515 [Andrew Or] Move ExecutorAllocationManager out of scheduler package 9ca8945 [Andrew Or] Rewrite callbacks through the listener interface 5e336b9 [Andrew Or] Remove code from backend to avoid conflict with SPARK-3822 092d1fd [Andrew Or] Remove timeout logic for pending requests 1309fab [Andrew Or] Request executors by specifying the number pending 8bc0e9d [Andrew Or] Add logic to expire pending requests after timeouts b750ee1 [Andrew Or] Express timers in terms of expiration times + remove retry logic 7f8dd47 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 9d516cc [Andrew Or] Bug fix: Actually trigger the add timer / add retry timer 44f1832 [Andrew Or] Rename configs to include time units eaae7ef [Andrew Or] Address various review comments 6f8be6c [Andrew Or] Beef up comments on what each of the timers mean baaa403 [Andrew Or] Simplify variable names (minor) 42beec8 [Andrew Or] Reset whether the add threshold is crossed on cancellation 9bcc0bc [Andrew Or] ExecutorScalingManager -> ExecutorAllocationManager 2784398 [Andrew Or] Merge branch 'master' of github.com:apache/spark into scaling-heuristics 5a97d9e [Andrew Or] Log retry attempts in INFO + clean up logging 2f55c9f [Andrew Or] Do not keep requesting executors even after max attempts 0acd1cb [Andrew Or] Rewrite timer logic with polling b3c7d44 [Andrew Or] Start the retry timer for adding executors at the right time 9b5f2ea [Andrew Or] Wording changes in comments and log messages c2203a5 [Andrew Or] Simplify code to access the scheduler backend e519d08 [Andrew Or] Simplify initialization code 2cc87a7 [Andrew Or] Add retry logic for removing executors d0b34a6 [Andrew Or] Add retry logic for adding executors 9cc4649 [Andrew Or] Simplifying synchronization logic 67c03c7 [Andrew Or] Correct semantics of adding executors + update comments 6c48ab0 [Andrew Or] Update synchronization comment 8901900 [Andrew Or] Simplify remove policy + change the semantics of add policy 1cc8444 [Andrew Or] Minor wording change ae5b64a [Andrew Or] Add synchronization 20ec6b9 [Andrew Or] First cut implementation of removing executors dynamically 4077ae2 [Andrew Or] Minor code re-organization 6f1fa66 [Andrew Or] First cut implementation of adding executors dynamically b2e6dcc [Andrew Or] Add skeleton interface for requesting / killing executors	2014-10-29 17:48:59 -07:00
zsxwing	e7fd80413d	[SPARK-4097] Fix the race condition of 'thread' There is a chance that `thread` is null when calling `thread.interrupt()`. ```Scala override def cancel(): Unit = this.synchronized { _cancelled = true if (thread != null) { thread.interrupt() } } ``` Should put `thread = null` into a `synchronized` block to fix the race condition. Author: zsxwing <zsxwing@gmail.com> Closes #2957 from zsxwing/SPARK-4097 and squashes the following commits: edf0aee [zsxwing] Add comments to explain the lock c5cfeca [zsxwing] Fix the race condition of 'thread'	2014-10-29 14:42:50 -07:00
Andrew Or	1df05a40eb	[SPARK-3822] Executor scaling mechanism for Yarn This is part of a broader effort to enable dynamic scaling of executors ([SPARK-3174](https://issues.apache.org/jira/browse/SPARK-3174)). This is intended to work alongside SPARK-3795 (#2746), SPARK-3796 and SPARK-3797, but is functionally independently of these other issues. The logic is built on top of PraveenSeluka's changes at #2798. This is different from the changes there in a few major ways: (1) the mechanism is implemented within the existing scheduler backend framework rather than in new `Actor` classes. This also introduces a parent abstract class `YarnSchedulerBackend` to encapsulate common logic to communicate with the Yarn `ApplicationMaster`. (2) The interface of requesting executors exposed to the `SparkContext` is the same, but the communication between the scheduler backend and the AM uses total number executors desired instead of an incremental number. This is discussed in #2746 and explained in the comments in the code. I have tested this significantly on a stable Yarn cluster. ------------ A remaining task for this issue is to tone down the error messages emitted when an executor is removed. Currently, `SparkContext` and its components react as if the executor has failed, resulting in many scary error messages and eventual timeouts. While it's not strictly necessary to fix this as of the first-cut implementation of this mechanism, it would be good to add logic to distinguish this case. I prefer to address this in a separate PR. I have filed a separate JIRA for this task at SPARK-4134. Author: Andrew Or <andrew@databricks.com> Author: Andrew Or <andrewor14@gmail.com> Closes #2840 from andrewor14/yarn-scaling-mechanism and squashes the following commits: 485863e [Andrew Or] Minor log message changes 4920be8 [Andrew Or] Clarify that public API is only for Yarn mode for now 1c57804 [Andrew Or] Reword a few comments + other review comments 6321140 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-scaling-mechanism 02836c0 [Andrew Or] Limit scope of synchronization 4e2ed7f [Andrew Or] Fix bug: keep track of removed executors properly 73ade46 [Andrew Or] Wording changes (minor) 2a7a6da [Andrew Or] Add `sc.killExecutor` as a shorthand (minor) 665f229 [Andrew Or] Mima excludes 79aa2df [Andrew Or] Simplify the request interface by asking for a total 04f625b [Andrew Or] Fix race condition that causes over-allocation of executors f4783f8 [Andrew Or] Change the semantics of requesting executors 005a124 [Andrew Or] Fix tests 4628b16 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-scaling-mechanism db4a679 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-scaling-mechanism 572f5c5 [Andrew Or] Unused import (minor) f30261c [Andrew Or] Kill multiple executors rather than one at a time de260d9 [Andrew Or] Simplify by skipping useless null check 9c52542 [Andrew Or] Simplify by skipping the TaskSchedulerImpl 97dd1a8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-scaling-mechanism d987b3e [Andrew Or] Move addWebUIFilters to Yarn scheduler backend 7b76d0a [Andrew Or] Expose mechanism in SparkContext as developer API 47466cd [Andrew Or] Refactor common Yarn scheduler backend logic c4dfaac [Andrew Or] Avoid thrashing when removing executors 53e8145 [Andrew Or] Start yarn actor early to listen for AM registration message bbee669 [Andrew Or] Add mechanism in yarn client mode	2014-10-29 14:01:00 -07:00
Reynold Xin	dff015533d	[SPARK-3453] Netty-based BlockTransferService, extracted from Spark core This PR encapsulates #2330, which is itself a continuation of #2240. The first goal of this PR is to provide an alternate, simpler implementation of the ConnectionManager which is based on Netty. In addition to this goal, however, we want to resolve [SPARK-3796](https://issues.apache.org/jira/browse/SPARK-3796), which calls for a standalone shuffle service which can be integrated into the YARN NodeManager, Standalone Worker, or on its own. This PR makes the first step in this direction by ensuring that the actual Netty service is as small as possible and extracted from Spark core. Given this, we should be able to construct this standalone jar which can be included in other JVMs without incurring significant dependency or runtime issues. The actual work to ensure that such a standalone shuffle service would work in Spark will be left for a future PR, however. In order to minimize dependencies and allow for the service to be long-running (possibly much longer-running than Spark, and possibly having to support multiple version of Spark simultaneously), the entire service has been ported to Java, where we have full control over the binary compatibility of the components and do not depend on the Scala runtime or version. These issues: have been addressed by folding in #2330: SPARK-3453: Refactor Netty module to use BlockTransferService interface SPARK-3018: Release all buffers upon task completion/failure SPARK-3002: Create a connection pool and reuse clients across different threads SPARK-3017: Integration tests and unit tests for connection failures SPARK-3049: Make sure client doesn't block when server/connection has error(s) SPARK-3502: SO_RCVBUF and SO_SNDBUF should be bootstrap childOption, not option SPARK-3503: Disable thread local cache in PooledByteBufAllocator TODO before mergeable: - [x] Implement uploadBlock() - [x] Unit tests for RPC side of code - [x] Performance testing (see comments [here](https://github.com/apache/spark/pull/2753#issuecomment-59475022)) - [x] Turn OFF by default (currently on for unit testing) Author: Reynold Xin <rxin@apache.org> Author: Aaron Davidson <aaron@databricks.com> Author: cocoatomo <cocoatomo77@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Author: Prashant Sharma <prashant.s@imaginea.com> Author: Davies Liu <davies.liu@gmail.com> Author: Anand Avati <avati@redhat.com> Closes #2753 from aarondav/netty and squashes the following commits: cadfd28 [Aaron Davidson] Turn netty off by default d7be11b [Aaron Davidson] Turn netty on by default 4a204b8 [Aaron Davidson] Fail block fetches if client connection fails 2b0d1c0 [Aaron Davidson] 100ch 0c5bca2 [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty 14e37f7 [Aaron Davidson] Address Reynold's comments 8dfcceb [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty 322dfc1 [Aaron Davidson] Address Reynold's comments, including major rename e5675a4 [Aaron Davidson] Fail outstanding RPCs as well ccd4959 [Aaron Davidson] Don't throw exception if client immediately fails 9da0bc1 [Aaron Davidson] Add RPC unit tests d236dfd [Aaron Davidson] Remove no-op serializer :) 7b7a26c [Aaron Davidson] Fix Nio compile issue dd420fd [Aaron Davidson] Merge branch 'master' of https://github.com/apache/spark into netty-test 939f276 [Aaron Davidson] Attempt to make comm. bidirectional aa58f67 [cocoatomo] [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings 8dc1ded [cocoatomo] [SPARK-3867][PySpark] ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed 5b5dbe6 [Prashant Sharma] [SPARK-2924] Required by scala 2.11, only one fun/ctor amongst overriden alternatives, can have default argument(s). 2c5d9dc [Patrick Wendell] HOTFIX: Fix build issue with Akka 2.3.4 upgrade. 020691e [Davies Liu] [SPARK-3886] [PySpark] use AutoBatchedSerializer by default ae4083a [Anand Avati] [SPARK-2805] Upgrade Akka to 2.3.4 29c6dcf [Aaron Davidson] [SPARK-3453] Netty-based BlockTransferService, extracted from Spark core f7e7568 [Reynold Xin] Fixed spark.shuffle.io.receiveBuffer setting. 5d98ce3 [Reynold Xin] Flip buffer. f6c220d [Reynold Xin] Merge with latest master. 407e59a [Reynold Xin] Fix style violation. a0518c7 [Reynold Xin] Implemented block uploads. 4b18db2 [Reynold Xin] Copy the buffer in fetchBlockSync. bec4ea2 [Reynold Xin] Removed OIO and added num threads settings. 1bdd7ee [Reynold Xin] Fixed tests. d68f328 [Reynold Xin] Logging close() in case close() fails. f63fb4c [Reynold Xin] Add more debug message. 6afc435 [Reynold Xin] Added logging. c066309 [Reynold Xin] Implement java.io.Closeable interface. 519d64d [Reynold Xin] Mark private package visibility and MimaExcludes. f0a16e9 [Reynold Xin] Fixed test hanging. 14323a5 [Reynold Xin] Removed BlockManager.getLocalShuffleFromDisk. b2f3281 [Reynold Xin] Added connection pooling. d23ed7b [Reynold Xin] Incorporated feedback from Norman: - use same pool for boss and worker - remove ioratio - disable caching of byte buf allocator - childoption sendbuf/receivebuf - fire exception through pipeline 9e0cb87 [Reynold Xin] Fixed BlockClientHandlerSuite 5cd33d7 [Reynold Xin] Fixed style violation. cb589ec [Reynold Xin] Added more test cases covering cleanup when fault happens in ShuffleBlockFetcherIteratorSuite 1be4e8e [Reynold Xin] Shorten NioManagedBuffer and NettyManagedBuffer class names. 108c9ed [Reynold Xin] Forgot to add TestSerializer to the commit list. b5c8d1f [Reynold Xin] Fixed ShuffleBlockFetcherIteratorSuite. 064747b [Reynold Xin] Reference count buffers and clean them up properly. 2b44cf1 [Reynold Xin] Added more documentation. 1760d32 [Reynold Xin] Use Epoll.isAvailable in BlockServer as well. 165eab1 [Reynold Xin] [SPARK-3453] Refactor Netty module to use BlockTransferService.	2014-10-29 11:27:07 -07:00
Xiangrui Meng	1559495dd9	[FIX] disable benchmark code I forgot to disable the benchmark code in #2937, which increased the Jenkins build time by couple minutes. aarondav Author: Xiangrui Meng <meng@databricks.com> Closes #2990 from mengxr/disable-benchmark and squashes the following commits: c58f070 [Xiangrui Meng] disable benchmark code	2014-10-28 22:00:10 -07:00
zsxwing	1536d70331	[SPARK-4008] Fix "kryo with fold" in KryoSerializerSuite `zeroValue` will be serialized by `spark.closure.serializer` but `spark.closure.serializer` only supports the default Java serializer. So it must not be `ClassWithoutNoArgConstructor`, which can not be serialized by the Java serializer. This PR changed `zeroValue` to null and updated the test to make it work correctly. Author: zsxwing <zsxwing@gmail.com> Closes #2856 from zsxwing/SPARK-4008 and squashes the following commits: 51da655 [zsxwing] [SPARK-4008] Fix "kryo with fold" in KryoSerializerSuite	2014-10-28 17:59:10 -07:00
Xiangrui Meng	84e5da87e3	[SPARK-4084] Reuse sort key in Sorter Sorter uses generic-typed key for sorting. When data is large, it creates lots of key objects, which is not efficient. We should reuse the key in Sorter for memory efficiency. This change is part of the petabyte sort implementation from rxin . The `Sorter` class was written in Java and marked package private. So it is only available to `org.apache.spark.util.collection`. I renamed it to `TimSort` and add a simple wrapper of it, still called `Sorter`, in Scala, which is `private[spark]`. The benchmark code is updated, which now resets the array before each run. Here is the result on sorting primitive Int arrays of size 25 million using Sorter: ~~~ [info] - Sorter benchmark for key-value pairs !!! IGNORED !!! Java Arrays.sort() on non-primitive int array: Took 13237 ms Java Arrays.sort() on non-primitive int array: Took 13320 ms Java Arrays.sort() on non-primitive int array: Took 15718 ms Java Arrays.sort() on non-primitive int array: Took 13283 ms Java Arrays.sort() on non-primitive int array: Took 13267 ms Java Arrays.sort() on non-primitive int array: Took 15122 ms Java Arrays.sort() on non-primitive int array: Took 15495 ms Java Arrays.sort() on non-primitive int array: Took 14877 ms Java Arrays.sort() on non-primitive int array: Took 16429 ms Java Arrays.sort() on non-primitive int array: Took 14250 ms Java Arrays.sort() on non-primitive int array: (13878 ms first try, 14499 ms average) Java Arrays.sort() on primitive int array: Took 2683 ms Java Arrays.sort() on primitive int array: Took 2683 ms Java Arrays.sort() on primitive int array: Took 2701 ms Java Arrays.sort() on primitive int array: Took 2746 ms Java Arrays.sort() on primitive int array: Took 2685 ms Java Arrays.sort() on primitive int array: Took 2735 ms Java Arrays.sort() on primitive int array: Took 2669 ms Java Arrays.sort() on primitive int array: Took 2693 ms Java Arrays.sort() on primitive int array: Took 2680 ms Java Arrays.sort() on primitive int array: Took 2642 ms Java Arrays.sort() on primitive int array: (2948 ms first try, 2691 ms average) Sorter without key reuse on primitive int array: Took 10732 ms Sorter without key reuse on primitive int array: Took 12482 ms Sorter without key reuse on primitive int array: Took 10718 ms Sorter without key reuse on primitive int array: Took 12650 ms Sorter without key reuse on primitive int array: Took 10747 ms Sorter without key reuse on primitive int array: Took 10783 ms Sorter without key reuse on primitive int array: Took 12721 ms Sorter without key reuse on primitive int array: Took 10604 ms Sorter without key reuse on primitive int array: Took 10622 ms Sorter without key reuse on primitive int array: Took 11843 ms Sorter without key reuse on primitive int array: (11089 ms first try, 11390 ms average) Sorter with key reuse on primitive int array: Took 5141 ms Sorter with key reuse on primitive int array: Took 5298 ms Sorter with key reuse on primitive int array: Took 5066 ms Sorter with key reuse on primitive int array: Took 5164 ms Sorter with key reuse on primitive int array: Took 5203 ms Sorter with key reuse on primitive int array: Took 5274 ms Sorter with key reuse on primitive int array: Took 5186 ms Sorter with key reuse on primitive int array: Took 5159 ms Sorter with key reuse on primitive int array: Took 5164 ms Sorter with key reuse on primitive int array: Took 5078 ms Sorter with key reuse on primitive int array: (5311 ms first try, 5173 ms average) ~~~ So with key reuse, it is faster and less likely to trigger GC. Author: Xiangrui Meng <meng@databricks.com> Author: Reynold Xin <rxin@apache.org> Closes #2937 from mengxr/SPARK-4084 and squashes the following commits: d73c3d0 [Xiangrui Meng] address comments 0b7b682 [Xiangrui Meng] fix mima a72f53c [Xiangrui Meng] update timeIt 38ba50c [Xiangrui Meng] update timeIt 720f731 [Xiangrui Meng] add doc about JIT specialization 78f2879 [Xiangrui Meng] update tests 7de2efd [Xiangrui Meng] update the Sorter benchmark code to be correct 8626356 [Xiangrui Meng] add prepare to timeIt and update testsin SorterSuite 5f0d530 [Xiangrui Meng] update method modifiers of SortDataFormat 6ffbe66 [Xiangrui Meng] rename Sorter to TimSort and add a Scala wrapper that is private[spark] b00db4d [Xiangrui Meng] doc and tests cf94e8a [Xiangrui Meng] renaming 464ddce [Reynold Xin] cherry-pick rxin's commit	2014-10-28 15:14:41 -07:00
zsxwing	abcafcfba3	[Spark 3922] Refactor spark-core to use Utils.UTF_8 A global UTF8 constant is very helpful to handle encoding problems when converting between String and bytes. There are several solutions here: 1. Add `val UTF_8 = Charset.forName("UTF-8")` to Utils.scala 2. java.nio.charset.StandardCharsets.UTF_8 (require JDK7) 3. io.netty.util.CharsetUtil.UTF_8 4. com.google.common.base.Charsets.UTF_8 5. org.apache.commons.lang.CharEncoding.UTF_8 6. org.apache.commons.lang3.CharEncoding.UTF_8 IMO, I prefer option 1) because people can find it easily. This is a PR for option 1) and only fixes Spark Core. Author: zsxwing <zsxwing@gmail.com> Closes #2781 from zsxwing/SPARK-3922 and squashes the following commits: f974edd [zsxwing] Merge branch 'master' into SPARK-3922 2d27423 [zsxwing] Refactor spark-core to use Refactor spark-core to use Utils.UTF_8	2014-10-28 14:26:57 -07:00
Shivaram Venkataraman	7768a800d4	[SPARK-4031] Make torrent broadcast read blocks on use. This avoids reading torrent broadcast variables when they are referenced in the closure but not used in the closure. This is done by using a `lazy val` to read broadcast blocks cc rxin JoshRosen for review Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #2871 from shivaram/broadcast-read-value and squashes the following commits: 1456d65 [Shivaram Venkataraman] Use getUsedTimeMs and remove readObject d6c5ee9 [Shivaram Venkataraman] Use laxy val to implement readBroadcastBlock 0b34df7 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into broadcast-read-value 9cec507 [Shivaram Venkataraman] Test if broadcast variables are read lazily 768b40b [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into broadcast-read-value 8792ed8 [Shivaram Venkataraman] Make torrent broadcast read blocks on use. This avoids reading broadcast variables when they are referenced in the closure but not used by the code.	2014-10-28 10:14:16 -07:00
Josh Rosen	46c63417c1	[SPARK-4107] Fix incorrect handling of read() and skip() return values `read()` may return fewer bytes than requested; when this occurred, the old code would silently return less data than requested, which might cause stream corruption errors. `skip()` faces similar issues, too. This patch fixes several cases where we mis-handle these methods' return values. Author: Josh Rosen <joshrosen@databricks.com> Closes #2969 from JoshRosen/file-channel-read-fix and squashes the following commits: e724a9f [Josh Rosen] Fix similar issue of not checking skip() return value. cbc03ce [Josh Rosen] Update the other log message, too. 01e6015 [Josh Rosen] file.getName -> file.getAbsolutePath d961d95 [Josh Rosen] Fix another issue in FileServerSuite. b9265d2 [Josh Rosen] Fix a similar (minor) issue in TestUtils. cd9d76f [Josh Rosen] Fix a similar error in Tachyon: 3db0008 [Josh Rosen] Fix a similar read() error in Utils.offsetBytes(). db985ed [Josh Rosen] Fix unsafe usage of FileChannel.read():	2014-10-28 00:04:16 -07:00
GuoQiang Li	7c0c26cd12	[SPARK-4064]NioBlockTransferService.fetchBlocks may cause spark to hang. cc @rxin Author: GuoQiang Li <witgo@qq.com> Closes #2929 from witgo/SPARK-4064 and squashes the following commits: 20110f2 [GuoQiang Li] Modify the exception msg 3425225 [GuoQiang Li] review commits 2b07e49 [GuoQiang Li] If we create a lot of big broadcast variables, Spark may hang	2014-10-27 23:31:46 -07:00
Sean Owen	bfa614b127	SPARK-4022 [CORE] [MLLIB] Replace colt dependency (LGPL) with commons-math This change replaces usages of colt with commons-math3 equivalents, and makes some minor necessary adjustments to related code and tests to match. Author: Sean Owen <sowen@cloudera.com> Closes #2928 from srowen/SPARK-4022 and squashes the following commits: 61a232f [Sean Owen] Fix failure due to different sampling in JavaAPISuite.sample() 16d66b8 [Sean Owen] Simplify seeding with call to reseedRandomGenerator a1a78e0 [Sean Owen] Use Well19937c 31c7641 [Sean Owen] Fix Python Poisson test by choosing a different seed; about 88% of seeds should work but 1 didn't, it seems 5c9c67f [Sean Owen] Additional test fixes from review d8f88e0 [Sean Owen] Replace colt with commons-math3. Some tests do not pass yet.	2014-10-27 10:53:15 -07:00
Sandy Ryza	dea302ddbd	SPARK-2621. Update task InputMetrics incrementally The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old method, which naively accepts the split size as the input bytes. An impact of this change will be that input metrics go away when using against Hadoop versions earlier thatn 2.5. I can add this back in, but my opinion is that no metrics are better than inaccurate metrics. This is difficult to write a test for because we don't usually build against a version of Hadoop that contains the function we need. I've tested it manually on a pseudo-distributed cluster. Author: Sandy Ryza <sandy@cloudera.com> Closes #2087 from sryza/sandy-spark-2621 and squashes the following commits: 23010b8 [Sandy Ryza] Missing style fixes 74fc9bb [Sandy Ryza] Make getFSBytesReadOnThreadCallback private 1ab662d [Sandy Ryza] Clear things up a bit 984631f [Sandy Ryza] Switch from pull to push model and add test 7ef7b22 [Sandy Ryza] Add missing curly braces 219abc9 [Sandy Ryza] Fall back to split size 90dbc14 [Sandy Ryza] SPARK-2621. Update task InputMetrics incrementally	2014-10-27 10:04:24 -07:00
Shivaram Venkataraman	9aa340a23f	[SPARK-4030] Make destroy public for broadcast variables This change makes the destroy function public for broadcast variables. Motivation for the change is described in https://issues.apache.org/jira/browse/SPARK-4030. This patch also logs where destroy was called from if a broadcast variable is used after destruction. Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #2922 from shivaram/broadcast-destroy and squashes the following commits: a11abab [Shivaram Venkataraman] Fix scala style in Utils.scala bed9c9d [Shivaram Venkataraman] Make destroy blocking by default e80c1ab [Shivaram Venkataraman] Make destroy public for broadcast variables Also log where destroy was called from if a broadcast variable is used after destruction.	2014-10-27 08:45:36 -07:00
Liang-Chi Hsieh	6377adaf32	[SPARK-3970] Remove duplicate removal of local dirs The shutdown hook of `DiskBlockManager` would remove localDirs. So do not need to register them with `Utils.registerShutdownDeleteDir`. It causes duplicate removal of these local dirs and corresponding exceptions. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #2826 from viirya/fix_duplicate_localdir_remove and squashes the following commits: 051d4b5 [Liang-Chi Hsieh] check dir existing and return empty List as default. 2b91a9c [Liang-Chi Hsieh] remove duplicate removal of local dirs.	2014-10-26 18:02:06 -07:00
Josh Rosen	bf589fc717	[SPARK-3616] Add basic Selenium tests to WebUISuite This patch adds Selenium tests for Spark's web UI. To avoid adding extra dependencies to the test environment, the tests use Selenium's HtmlUnitDriver, which is pure-Java, instead of, say, ChromeDriver. I added new tests to try to reproduce a few UI bugs reported on JIRA, namely SPARK-3021, SPARK-2105, and SPARK-2527. I wasn't able to reproduce these bugs; I suspect that the older ones might have been fixed by other patches. In order to use HtmlUnitDriver, I added an explicit dependency on the org.apache.httpcomponents version of httpclient in order to prevent jets3t's older version from taking precedence on the classpath. I also upgraded ScalaTest to 2.2.1. Author: Josh Rosen <joshrosen@apache.org> Author: Josh Rosen <joshrosen@databricks.com> Closes #2474 from JoshRosen/webui-selenium-tests and squashes the following commits: fcc9e83 [Josh Rosen] scalautils -> scalactic package rename 510e54a [Josh Rosen] [SPARK-3616] Add basic Selenium tests to WebUISuite.	2014-10-26 11:29:27 -07:00
Sean Owen	df7974b8e5	SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work with Java 8 This follows https://github.com/apache/spark/pull/2893 , but does not completely fix SPARK-3359 either. This fixes minor scaladoc/javadoc issues that Javadoc 8 will treat as errors. Author: Sean Owen <sowen@cloudera.com> Closes #2909 from srowen/SPARK-3359 and squashes the following commits: f62c347 [Sean Owen] Fix some javadoc issues that javadoc 8 considers errors. This is not all of the errors turned up when javadoc 8 runs on output of genjavadoc.	2014-10-25 23:18:02 -07:00
Andrew Or	c683444008	[SPARK-4071] Unroll fails silently if BlockManager is small In tests, we may want to have BlockManagers of size < 1MB (spark.storage.unrollMemoryThreshold). However, these BlockManagers are useless because we can't unroll anything in them ever. At the very least we need to log a warning. tdas Author: Andrew Or <andrew@databricks.com> Closes #2917 from andrewor14/unroll-safely-logging and squashes the following commits: 38947e3 [Andrew Or] Warn against starting a block manager that's too small fd621b4 [Andrew Or] Warn against failure to reserve initial memory threshold	2014-10-25 20:07:44 -07:00
Josh Rosen	9530316887	[SPARK-2321] Stable pull-based progress / status API This pull request is a first step towards the implementation of a stable, pull-based progress / status API for Spark (see [SPARK-2321](https://issues.apache.org/jira/browse/SPARK-2321)). For now, I'd like to discuss the basic implementation, API names, and overall interface design. Once we arrive at a good design, I'll go back and add additional methods to expose more information via these API. #### Design goals: - Pull-based API - Usable from Java / Scala / Python (eventually, likely with a wrapper) - Can be extended to expose more information without introducing binary incompatibilities. - Returns immutable objects. - Don't leak any implementation details, preserving our freedom to change the implementation. #### Implementation: - Add public methods (`getJobInfo`, `getStageInfo`) to SparkContext to allow status / progress information to be retrieved. - Add public interfaces (`SparkJobInfo`, `SparkStageInfo`) for our API return values. These interfaces consist entirely of Java-style getter methods. The interfaces are currently implemented in Java. I decided to explicitly separate the interface from its implementation (`SparkJobInfoImpl`, `SparkStageInfoImpl`) in order to prevent users from constructing these responses themselves. -Allow an existing JobProgressListener to be used when constructing a live SparkUI. This allows us to re-use this listeners in the implementation of this status API. There are a few reasons why this listener re-use makes sense: - The status API and web UI are guaranteed to show consistent information. - These listeners are already well-tested. - The same garbage-collection / information retention configurations can apply to both this API and the web UI. - Extend JobProgressListener to maintain `jobId -> Job` and `stageId -> Stage` mappings. The progress API methods are implemented in a separate trait that's mixed into SparkContext. This helps to avoid SparkContext.scala from becoming larger and more difficult to read. Author: Josh Rosen <joshrosen@databricks.com> Author: Josh Rosen <joshrosen@apache.org> Closes #2696 from JoshRosen/progress-reporting-api and squashes the following commits: e6aa78d [Josh Rosen] Add tests. b585c16 [Josh Rosen] Accept SparkListenerBus instead of more specific subclasses. c96402d [Josh Rosen] Address review comments. 2707f98 [Josh Rosen] Expose current stage attempt id c28ba76 [Josh Rosen] Update demo code: 646ff1d [Josh Rosen] Document spark.ui.retainedJobs. 7f47d6d [Josh Rosen] Clean up SparkUI constructors, per Andrew's feedback. b77b3d8 [Josh Rosen] Merge remote-tracking branch 'origin/master' into progress-reporting-api 787444c [Josh Rosen] Move status API methods into trait that can be mixed into SparkContext. f9a9a00 [Josh Rosen] More review comments: 3dc79af [Josh Rosen] Remove creation of unused listeners in SparkContext. 249ca16 [Josh Rosen] Address several review comments: da5648e [Josh Rosen] Add example of basic progress reporting in Java. 7319ffd [Josh Rosen] Add getJobIdsForGroup() and num*Tasks() methods. cc568e5 [Josh Rosen] Add note explaining that interfaces should not be implemented outside of Spark. 6e840d4 [Josh Rosen] Remove getter-style names and "consistent snapshot" semantics: 08cbec9 [Josh Rosen] Begin to sketch the interfaces for a stable, public status API. ac2d13a [Josh Rosen] Add jobId->stage, stageId->stage mappings in JobProgressListener 24de263 [Josh Rosen] Create UI listeners in SparkContext instead of in Tabs:	2014-10-25 00:06:57 -07:00
Josh Rosen	6c98c29ae0	[SPARK-4080] Only throw IOException from [write\|read][Object\|External] If classes implementing Serializable or Externalizable interfaces throw exceptions other than IOException or ClassNotFoundException from their (de)serialization methods, then this results in an unhelpful "IOException: unexpected exception type" rather than the actual exception that produced the (de)serialization error. This patch fixes this by adding a utility method that re-wraps any uncaught exceptions in IOException (unless they are already instances of IOException). Author: Josh Rosen <joshrosen@databricks.com> Closes #2932 from JoshRosen/SPARK-4080 and squashes the following commits: cd3a9be [Josh Rosen] [SPARK-4080] Only throw IOException from [write\|read][Object\|External].	2014-10-24 15:06:15 -07:00
Nan Zhu	f80dcf2aee	[SPARK-4067] refactor ExecutorUncaughtExceptionHandler https://issues.apache.org/jira/browse/SPARK-4067 currently , we call Utils.tryOrExit everywhere AppClient Executor TaskSchedulerImpl It makes the name of ExecutorUncaughtExceptionHandler unfit to the real case.... Author: Nan Zhu <nanzhu@Nans-MacBook-Pro.local> Author: Nan Zhu <nanzhu@nans-mbp.home> Closes #2913 from CodingCat/SPARK-4067 and squashes the following commits: 035ee3d [Nan Zhu] make RAT happy e62e416 [Nan Zhu] add some general Exit code a10b63f [Nan Zhu] refactor	2014-10-24 13:47:06 -07:00
Andrew Or	b563987e8d	[SPARK-4013] Do not create multiple actor systems on each executor In the existing code, each coarse-grained executor has two concurrently running actor systems. This causes many more error messages to be logged than necessary when the executor is lost or killed because we receive a disassociation event for each of these actor systems. This is blocking #2840. Author: Andrew Or <andrewor14@gmail.com> Closes #2863 from andrewor14/executor-actor-system and squashes the following commits: 44ce2e0 [Andrew Or] Avoid starting two actor systems on each executor	2014-10-24 13:32:23 -07:00
Kousuke Saruta	098f83c7cc	[SPARK-4075] [Deploy] Jar url validation is not enough for Jar file In deploy.ClientArguments.isValidJarUrl, the url is checked as follows. def isValidJarUrl(s: String): Boolean = s.matches("(.+):(.+)jar") So, it allows like 'hdfs:file.jar' (no authority). Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #2925 from sarutak/uri-syntax-check-improvement and squashes the following commits: cf06173 [Kousuke Saruta] Improved URI syntax checking	2014-10-24 13:08:59 -07:00
Li Zhihui	7aacb7bfad	[SPARK-2713] Executors of same application in same host should only download files & jars once If Spark lunched multiple executors in one host for one application, every executor would download it dependent files and jars (if not using local: url) independently. It maybe result in huge latency. In my case, it result in 20 seconds latency to download dependent jars(size about 17M) when I lunched 32 executors in every host(total 4 hosts). This patch will cache downloaded files and jars for executors to reduce network throughput and download latency. In my case, the latency was reduced from 20 seconds to less than 1 second. Author: Li Zhihui <zhihui.li@intel.com> Author: li-zhihui <zhihui.li@intel.com> Closes #1616 from li-zhihui/cachefiles and squashes the following commits: 36940df [Li Zhihui] Close cache for local mode 935fed6 [Li Zhihui] Clean code. f9330d4 [Li Zhihui] Clean code again 7050d46 [Li Zhihui] Clean code 074a422 [Li Zhihui] Fix: deal with spark.files.overwrite 03ed3a8 [li-zhihui] rename cache file name as XXXXXXXXX_cache 2766055 [li-zhihui] Use url.hashCode + timestamp as cachedFileName 76a7b66 [Li Zhihui] Clean code & use applcation work directory as cache directory 3510eb0 [Li Zhihui] Keep fetchFile private 2ffd742 [Li Zhihui] add comment for FileLock e0ebd48 [Li Zhihui] Try and finally lock.release 7fb7c0b [Li Zhihui] Release lock before copy files 6b997bf [Li Zhihui] Executors of same application in same host should only download files & jars once	2014-10-24 13:01:36 -07:00
Davies Liu	e595c8d08a	[SPARK-3993] [PySpark] fix bug while reuse worker after take() After take(), maybe there are some garbage left in the socket, then next task assigned to this worker will hang because of corrupted data. We should make sure the socket is clean before reuse it, write END_OF_STREAM at the end, and check it after read out all result from python. Author: Davies Liu <davies.liu@gmail.com> Author: Davies Liu <davies@databricks.com> Closes #2838 from davies/fix_reuse and squashes the following commits: 8872914 [Davies Liu] fix tests 660875b [Davies Liu] fix bug while reuse worker after take()	2014-10-23 17:20:00 -07:00
Josh Rosen	83b7a1c650	[SPARK-4019] [SPARK-3740] Fix MapStatus compression bug that could lead to empty results or Snappy errors This commit fixes a bug in MapStatus that could cause jobs to wrongly return empty results if those jobs contained stages with more than 2000 partitions where most of those partitions were empty. For jobs with > 2000 partitions, MapStatus uses HighlyCompressedMapStatus, which only stores the average size of blocks. If the average block size is zero, then this will cause all blocks to be reported as empty, causing BlockFetcherIterator to mistakenly skip them. For example, this would return an empty result: sc.makeRDD(0 until 10, 1000).repartition(2001).collect() This can also lead to deserialization errors (e.g. Snappy decoding errors) for jobs with > 2000 partitions where the average block size is non-zero but there is at least one empty block. In this case, the BlockFetcher attempts to fetch empty blocks and fails when trying to deserialize them. The root problem here is that MapStatus has a (previously undocumented) correctness property that was violated by HighlyCompressedMapStatus: If a block is non-empty, then getSizeForBlock must be non-zero. I fixed this by modifying HighlyCompressedMapStatus to store the average size of _non-empty_ blocks and to use a compressed bitmap to track which blocks are empty. I also removed a test which was broken as originally written: it attempted to check that HighlyCompressedMapStatus's size estimation error was < 10%, but this was broken because HighlyCompressedMapStatus is only used for map statuses with > 2000 partitions, but the test only created 50. Author: Josh Rosen <joshrosen@databricks.com> Closes #2866 from JoshRosen/spark-4019 and squashes the following commits: fc8b490 [Josh Rosen] Roll back hashset change, which didn't improve performance. 5faa0a4 [Josh Rosen] Incorporate review feedback c8b8cae [Josh Rosen] Two performance fixes: 3b892dd [Josh Rosen] Address Reynold's review comments ba2e71c [Josh Rosen] Add missing newline 609407d [Josh Rosen] Use Roaring Bitmap to track non-empty blocks. c23897a [Josh Rosen] Use sets when comparing collect() results 91276a3 [Josh Rosen] [SPARK-4019] Fix MapStatus compression bug that could lead to empty results.	2014-10-23 16:39:32 -07:00

... 6 7 8 9 10 ...

4585 commits