ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Timothy Chen	73431d8afb	[SPARK-10124] [MESOS] Fix removing queued driver in mesos cluster mode. Currently the spark applications can be queued to the Mesos cluster dispatcher, but when multiple jobs are in queue we don't handle removing jobs from the buffer correctly while iterating and causes null pointer exception. This patch copies the buffer before iterating them, so exceptions aren't thrown when the jobs are removed. Author: Timothy Chen <tnachen@gmail.com> Closes #8322 from tnachen/fix_cluster_mode.	2015-08-19 19:43:26 -07:00
Marcelo Vanzin	e0dd1309ac	[SPARK-10119] [CORE] Fix isDynamicAllocationEnabled when config is expliticly disabled. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8316 from vanzin/SPARK-10119.	2015-08-19 14:33:32 -07:00
Joshi	f3391ff2b8	[SPARK-8889] [CORE] Fix for OOM for graph creation Fix for OOM for graph creation Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes #7602 from rekhajoshm/SPARK-8889.	2015-08-19 21:23:02 +01:00
Yu ISHIKAWA	2fcb9cb955	[SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are complicated I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type. ### JIRA [[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8264 from yu-iskw/SPARK-9856-3.	2015-08-19 10:41:14 -07:00
Han JU	3d16a54500	[SPARK-8949] Print warnings when using preferred locations feature Add warnings according to SPARK-8949 in `SparkContext` - warnings in scaladoc - log warnings when preferred locations feature is used through `SparkContext`'s constructor However I didn't found any documentation reference of this feature. Please direct me if you know any reference to this feature. Author: Han JU <ju.han.felix@gmail.com> Closes #7874 from darkjh/SPARK-8949.	2015-08-19 13:04:16 +01:00
Tathagata Das	bc9a0e0323	[SPARK-9967] [SPARK-10099] [STREAMING] Renamed conf spark.streaming.backpressure.{enable-->enabled} and fixed deprecated annotations Small changes - Renamed conf spark.streaming.backpressure.{enable --> enabled} - Change Java Deprecated annotations to Scala deprecated annotation with more information. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8299 from tdas/SPARK-9967.	2015-08-18 23:37:57 -07:00
Josh Rosen	010b03ed52	[SPARK-9952] Fix N^2 loop when DAGScheduler.getPreferredLocsInternal accesses cacheLocs In Scala, `Seq.fill` always seems to return a List. Accessing a list by index is an O(N) operation. Thus, the following code will be really slow (~10 seconds on my machine): ```scala val numItems = 100000 val s = Seq.fill(numItems)(1) for (i <- 0 until numItems) s(i) ``` It turns out that we had a loop like this in DAGScheduler code, although it's a little tricky to spot. In `getPreferredLocsInternal`, there's a call to `getCacheLocs(rdd)(partition)`. The `getCacheLocs` call returns a Seq. If this Seq is a List and the RDD contains many partitions, then indexing into this list will cost O(partitions). Thus, when we loop over our tasks to compute their individual preferred locations we implicitly perform an N^2 loop, reducing scheduling throughput. This patch fixes this by replacing `Seq` with `Array`. Author: Josh Rosen <joshrosen@databricks.com> Closes #8178 from JoshRosen/dagscheduler-perf.	2015-08-18 22:30:13 -07:00
Marcelo Vanzin	c1840a862e	[SPARK-7736] [CORE] Fix a race introduced in PythonRunner. The fix for SPARK-7736 introduced a race where a port value of "-1" could be passed down to the pyspark process, causing it to fail to connect back to the JVM. This change adds code to fix that race. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8258 from vanzin/SPARK-7736.	2015-08-18 11:36:36 -07:00
CodingCat	c34e9ff0ea	[MINOR] fix the comments in IndexShuffleBlockResolver it might be a typo introduced at the first moment or some leftover after some renaming...... the name of the method accessing the index file is called `getBlockData` now (not `getBlockLocation` as indicated in the comments) Author: CodingCat <zhunansjtu@gmail.com> Closes #8238 from CodingCat/minor_1.	2015-08-18 10:31:11 +01:00
Marcelo Vanzin	f68d024096	[SPARK-7736] [CORE] [YARN] Make pyspark fail YARN app on failure. The YARN backend doesn't like when user code calls `System.exit`, since it cannot know the exit status and thus cannot set an appropriate final status for the application. So, for pyspark, avoid that call and instead throw an exception with the exit code. SparkSubmit handles that exception and exits with the given exit code, while YARN uses the exit code as the failure code for the Spark app. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7751 from vanzin/SPARK-9416.	2015-08-17 10:34:22 -07:00
Rohit Agarwal	ed092a06c2	[SPARK-9924] [WEB UI] Don't schedule checkForLogs while some of them are already running. Author: Rohit Agarwal <rohita@qubole.com> Closes #8153 from mindprince/SPARK-9924.	2015-08-17 10:31:57 -07:00
Matei Zaharia	cf016075a0	[SPARK-10008] Ensure shuffle locality doesn't take precedence over narrow deps The shuffle locality patch made the DAGScheduler aware of shuffle data, but for RDDs that have both narrow and shuffle dependencies, it can cause them to place tasks based on the shuffle dependency instead of the narrow one. This case is common in iterative join-based algorithms like PageRank and ALS, where one RDD is hash-partitioned and one isn't. Author: Matei Zaharia <matei@databricks.com> Closes #8220 from mateiz/shuffle-loc-fix.	2015-08-16 00:34:58 -07:00
Herman van Hovell	a85fb6c07f	[SPARK-9980] [BUILD] Fix SBT publishLocal error due to invalid characters in doc Tiny modification to a few comments ```sbt publishLocal``` work again. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #8209 from hvanhovell/SPARK-9980.	2015-08-15 10:46:04 +01:00
Davies Liu	37586e5449	[HOTFIX] fix duplicated braces Author: Davies Liu <davies@databricks.com> Closes #8219 from davies/fix_typo.	2015-08-14 20:56:55 -07:00
Reynold Xin	e5fd60415f	[SPARK-9934] Deprecate NIO ConnectionManager. Deprecate NIO ConnectionManager in Spark 1.5.0, before removing it in Spark 1.6.0. Author: Reynold Xin <rxin@databricks.com> Closes #8162 from rxin/SPARK-9934.	2015-08-14 20:55:32 -07:00
jerryshao	9407baa2a7	[SPARK-9877] [CORE] Fix StandaloneRestServer NPE when submitting application Detailed exception log can be seen in [SPARK-9877](https://issues.apache.org/jira/browse/SPARK-9877), the problem is when creating `StandaloneRestServer`, `self` (`masterEndpoint`) is null. So this fix is creating `StandaloneRestServer` when `self` is available. Author: jerryshao <sshao@hortonworks.com> Closes #8127 from jerryshao/SPARK-9877.	2015-08-14 13:44:38 -07:00
Andrew Or	6518ef6303	[SPARK-9948] Fix flaky AccumulatorSuite - internal accumulators In these tests, we use a custom listener and we assert on fields in the stage / task completion events. However, these events are posted in a separate thread so they're not guaranteed to be posted in time. This commit fixes this flakiness through a job end registration callback. Author: Andrew Or <andrew@databricks.com> Closes #8176 from andrewor14/fix-accumulator-suite.	2015-08-14 13:42:53 -07:00
Carson Wang	33bae585d4	[SPARK-9809] Task crashes because the internal accumulators are not properly initialized When a stage failed and another stage was resubmitted with only part of partitions to compute, all the tasks failed with error message: java.util.NoSuchElementException: key not found: peakExecutionMemory. This is because the internal accumulators are not properly initialized for this stage while other codes assume the internal accumulators always exist. Author: Carson Wang <carson.wang@intel.com> Closes #8090 from carsonwang/SPARK-9809.	2015-08-14 13:38:25 -07:00
Neelesh Srinivas Salian	57c2d08800	[SPARK-9923] [CORE] ShuffleMapStage.numAvailableOutputs should be an Int instead of Long Modified type of ShuffleMapStage.numAvailableOutputs from Long to Int Author: Neelesh Srinivas Salian <nsalian@cloudera.com> Closes #8183 from nssalian/SPARK-9923.	2015-08-14 20:03:50 +01:00
Davies Liu	bd35385d53	[SPARK-9945] [SQL] pageSize should be calculated from executor.memory Currently, pageSize of TungstenSort is calculated from driver.memory, it should use executor.memory instead. Also, in the worst case, the safeFactor could be 4 (because of rounding), increase it to 16. cc rxin Author: Davies Liu <davies@databricks.com> Closes #8175 from davies/page_size.	2015-08-13 21:12:59 -07:00
Andrew Or	8815ba2f67	[SPARK-9649] Fix MasterSuite, third time's a charm This particular test did not load the default configurations so it continued to start the REST server, which causes port bind exceptions.	2015-08-13 11:31:10 -07:00
Davies Liu	a8ab2634c1	[SPARK-9832] [SQL] add a thread-safe lookup for BytesToBytseMap This patch add a thread-safe lookup for BytesToBytseMap, and use that in broadcasted HashedRelation. Author: Davies Liu <davies@databricks.com> Closes #8151 from davies/safeLookup.	2015-08-12 21:26:00 -07:00
Josh Rosen	7b13ed27c1	[SPARK-9870] Disable driver UI and Master REST server in SparkSubmitSuite I think that we should pass additional configuration flags to disable the driver UI and Master REST server in SparkSubmitSuite and HiveSparkSubmitSuite. This might cut down on port-contention-related flakiness in Jenkins. Author: Josh Rosen <joshrosen@databricks.com> Closes #8124 from JoshRosen/disable-ui-in-sparksubmitsuite.	2015-08-12 18:52:11 -07:00
Rohit Agarwal	0d1d146c22	[SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI. Author: Rohit Agarwal <rohita@qubole.com> Closes #8014 from mindprince/SPARK-9724 and squashes the following commits: a7af5ff [Rohit Agarwal] [SPARK-9724] [WEB UI] Inline attachPrefix and attachPrefixForRedirect. Fix logic of attachPrefix 8a977cd [Rohit Agarwal] [SPARK-9724] [WEB UI] Address review comments: Remove unneeded code, update scaladoc. b257844 [Rohit Agarwal] [SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.	2015-08-12 17:48:43 -07:00
Michel Lemay	ab7e721cfe	[SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties Refactor Utils class and create ShutdownHookManager. NOTE: Wasn't able to run /dev/run-tests on windows machine. Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark) ex: log4j.rootCategory=WARN,console,redis log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.spark.graphx.Pregel=INFO log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender log4j.appender.redis.endpoints=hostname:port log4j.appender.redis.key=mykey log4j.appender.redis.alwaysBatch=false log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1 Author: michellemay <mlemay@gmail.com> Closes #8109 from michellemay/SPARK-9826.	2015-08-12 16:41:35 -07:00
Niranjan Padmanabhan	738f353988	[SPARK-9092] Fixed incompatibility when both num-executors and dynamic... … allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext. Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com> Closes #7657 from neurons/SPARK-9092.	2015-08-12 16:10:21 -07:00
Xiangrui Meng	6f60298b1d	[SPARK-8967] [DOC] add Since annotation Add `Since` as a Scala annotation. The benefit is that we can use it without having explicit JavaDoc. This is useful for inherited methods. The limitation is that is doesn't show up in the generated Java API documentation. This might be fixed by modifying genjavadoc. I think we could leave it as a TODO. This is how the generated Scala doc looks: `since` JavaDoc tag: ![screen shot 2015-08-11 at 10 00 37 pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png) `Since` annotation: ![screen shot 2015-08-11 at 10 00 28 pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png) rxin Author: Xiangrui Meng <meng@databricks.com> Closes #8131 from mengxr/SPARK-8967.	2015-08-12 14:28:23 -07:00
Andrew Or	e0110792ef	[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation This is the sister patch to #8011, but for aggregation. In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance. rxin yhuai Author: Andrew Or <andrew@databricks.com> Closes #8038 from andrewor14/unsafe-starve-memory-agg.	2015-08-12 10:08:35 -07:00
Andrew Or	be5d191207	[SPARK-9795] Dynamic allocation: avoid double counting when killing same executor twice This is based on KaiXinXiaoLei's changes in #7716. The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice quickly, then the executor target will be adjusted downwards by 2 instead of 1 even though we're only actually killing one executor. In certain cases where we don't adjust the target back upwards quickly, we'll end up with jobs hanging. This is a common danger because there are many places where this is called: - `HeartbeatReceiver` kills an executor that has not been sending heartbeats - `ExecutorAllocationManager` kills an executor that has been idle - The user code might call this, which may interfere with the previous callers While it's not clear whether this fixes SPARK-9745, fixing this potential race condition seems like a strict improvement. I've added a regression test to illustrate the issue. Author: Andrew Or <andrew@databricks.com> Closes #8078 from andrewor14/da-double-kill.	2015-08-12 09:24:50 -07:00
Tom White	2e680668f7	[SPARK-8625] [CORE] Propagate user exceptions in tasks back to driver This allows clients to retrieve the original exception from the cause field of the SparkException that is thrown by the driver. If the original exception is not in fact Serializable then it will not be returned, but the message and stacktrace will be. (All Java Throwables implement the Serializable interface, but this is no guarantee that a particular implementation can actually be serialized.) Author: Tom White <tom@cloudera.com> Closes #7014 from tomwhite/propagate-user-exceptions.	2015-08-12 10:07:11 -05:00
Timothy Chen	5c99d8bf98	[SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory. Author: Timothy Chen <tnachen@gmail.com> Closes #7195 from tnachen/mesos_files.	2015-08-11 23:26:33 -07:00
Carson Wang	bab8923285	[SPARK-9426] [WEBUI] Job page DAG visualization is not shown To reproduce the issue, go to the stage page and click DAG Visualization once, then go to the job page to show the job DAG visualization. You will only see the first stage of the job. Root cause: the java script use local storage to remember your selection. Once you click the stage DAG visualization, the local storage set `expand-dag-viz-arrow-stage` to true. When you go to the job page, the js checks `expand-dag-viz-arrow-stage` in the local storage first and will try to show stage DAG visualization on the job page. To fix this, I set an id to the DAG span to differ job page and stage page. In the js code, we check the id and local storage together to make sure we show the correct DAG visualization. Author: Carson Wang <carson.wang@intel.com> Closes #8104 from carsonwang/SPARK-9426.	2015-08-11 23:25:02 -07:00
zsxwing	4e3f4b934f	[SPARK-9829] [WEBUI] Display the update value for peak execution memory The peak execution memory is not correct because it shows the sum of finished tasks' values when a task finishes. This PR fixes it by using the update value rather than the accumulator value. Author: zsxwing <zsxwing@gmail.com> Closes #8121 from zsxwing/SPARK-9829.	2015-08-11 23:23:17 -07:00
Rohit Agarwal	a807fcbe50	[SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple applications Author: Rohit Agarwal <rohita@qubole.com> Closes #8088 from mindprince/SPARK-9806.	2015-08-11 23:20:39 -07:00
xutingjun	b85f9a242a	[SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks Author: xutingjun <xutingjun@huawei.com> Author: meiyoula <1039320815@qq.com> Closes #6817 from XuTingjun/SPARK-8366.	2015-08-11 23:19:35 -07:00
zsxwing	f16bc68dfb	[SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference `InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed. This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak. Author: zsxwing <zsxwing@gmail.com> Closes #8108 from zsxwing/internal-accumulators-leak.	2015-08-11 14:06:23 -07:00
Jeff Zhang	bce72797f3	Fix comment error API is updated but its doc comment is not updated. Author: Jeff Zhang <zjffdu@apache.org> Closes #8097 from zjffdu/dev.	2015-08-11 10:42:17 -07:00
Reynold Xin	d378396f86	[SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform. PlatformDependent.UNSAFE is way too verbose. Author: Reynold Xin <rxin@databricks.com> Closes #8094 from rxin/SPARK-9815 and squashes the following commits: 229b603 [Reynold Xin] [SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.	2015-08-11 08:41:06 -07:00
Marcelo Vanzin	0f3366a4c7	[SPARK-9710] [TEST] Fix RPackageUtilsSuite when R is not available. RUtils.isRInstalled throws an exception if R is not installed, instead of returning false. Fix that. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8008 from vanzin/SPARK-9710 and squashes the following commits: df72d8c [Marcelo Vanzin] [SPARK-9710] [test] Fix RPackageUtilsSuite when R is not available.	2015-08-10 10:10:40 -07:00
Shivaram Venkataraman	46025616b4	[CORE] [SPARK-9760] Use Option instead of Some for Ivy repos This was introduced in #7599 cc rxin brkyvz Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8055 from shivaram/spark-packages-repo-fix and squashes the following commits: 890f306 [Shivaram Venkataraman] Remove test case 51d69ee [Shivaram Venkataraman] Add test case for --packages without --repository c02e0b4 [Shivaram Venkataraman] Use Option instead of Some for Ivy repos	2015-08-09 14:30:30 -07:00
Reynold Xin	e9c36938ba	[SPARK-9752][SQL] Support UnsafeRow in Sample operator. In order for this to work, I had to disable gap sampling. Author: Reynold Xin <rxin@databricks.com> Closes #8040 from rxin/SPARK-9752 and squashes the following commits: f9e248c [Reynold Xin] Fix the test case for real this time. adbccb3 [Reynold Xin] Fixed test case. 589fb23 [Reynold Xin] Merge branch 'SPARK-9752' of github.com:rxin/spark into SPARK-9752 55ccddc [Reynold Xin] Fixed core test. 78fa895 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator. c9e7112 [Reynold Xin] [SPARK-9752][SQL] Support UnsafeRow in Sample operator.	2015-08-09 10:58:36 -07:00
Carson Wang	ef062c1599	[SPARK-9731] Standalone scheduling incorrect cores if spark.executor.cores is not set The issue only happens if `spark.executor.cores` is not set and executor memory is set to a high value. For example, if we have a worker with 4G and 10 cores and we set `spark.executor.memory` to 3G, then only 1 core is assigned to the executor. The correct number should be 10 cores. I've added a unit test to illustrate the issue. Author: Carson Wang <carson.wang@intel.com> Closes #8017 from carsonwang/SPARK-9731 and squashes the following commits: d09ec48 [Carson Wang] Fix code style 86b651f [Carson Wang] Simplify the code 943cc4c [Carson Wang] fix scheduling correct cores to executors	2015-08-07 23:36:26 -07:00
Andrew Or	881548ab20	[SPARK-9674] Re-enable ignored test in SQLQuerySuite The original code that this test tests is removed in `9270bd06fd`. It was ignored shortly before that so we never caught it. This patch re-enables the test and adds the code necessary to make it pass. JoshRosen yhuai Author: Andrew Or <andrew@databricks.com> Closes #8015 from andrewor14/SPARK-9674 and squashes the following commits: 225eac2 [Andrew Or] Merge branch 'master' of github.com:apache/spark into SPARK-9674 8c24209 [Andrew Or] Fix NPE e541d64 [Andrew Or] Track aggregation memory for both sort and hash 0be3a42 [Andrew Or] Fix test	2015-08-07 14:20:13 -07:00
zsxwing	ebfd91c542	[SPARK-9467][SQL]Add SQLMetric to specialize accumulators to avoid boxing This PR adds SQLMetric/SQLMetricParam/SQLMetricValue to specialize accumulators to avoid boxing. All SQL metrics should use these classes rather than `Accumulator`. Author: zsxwing <zsxwing@gmail.com> Closes #7996 from zsxwing/sql-accu and squashes the following commits: 14a5f0a [zsxwing] Address comments 367ca23 [zsxwing] Use localValue directly to avoid changing Accumulable 42f50c3 [zsxwing] Add SQLMetric to specialize accumulators to avoid boxing	2015-08-07 00:09:58 -07:00
Davies Liu	15bd6f338d	[SPARK-9453] [SQL] support records larger than page size in UnsafeShuffleExternalSorter This patch follows exactly #7891 (except testing) Author: Davies Liu <davies@databricks.com> Closes #8005 from davies/larger_record and squashes the following commits: f9c4aff [Davies Liu] address comments 9de5c72 [Davies Liu] support records larger than page size in UnsafeShuffleExternalSorter	2015-08-06 23:40:38 -07:00
Reynold Xin	4309262ec9	[SPARK-9700] Pick default page size more intelligently. Previously, we use 64MB as the default page size, which was way too big for a lot of Spark applications (especially for single node). This patch changes it so that the default page size, if unset by the user, is determined by the number of cores available and the total execution memory available. Author: Reynold Xin <rxin@databricks.com> Closes #8012 from rxin/pagesize and squashes the following commits: 16f4756 [Reynold Xin] Fixed failing test. 5afd570 [Reynold Xin] private... 0d5fb98 [Reynold Xin] Update default value. 674a6cd [Reynold Xin] Address review feedback. dc00e05 [Reynold Xin] Merge with master. 73ebdb6 [Reynold Xin] [SPARK-9700] Pick default page size more intelligently.	2015-08-06 23:18:29 -07:00
zsxwing	672f467668	[SPARK-8057][Core]Call TaskAttemptContext.getTaskAttemptID using Reflection Someone may use the Spark core jar in the maven repo with hadoop 1. SPARK-2075 has already resolved the compatibility issue to support it. But `SparkHadoopMapRedUtil.commitTask` broke it recently. This PR uses Reflection to call `TaskAttemptContext.getTaskAttemptID` to fix the compatibility issue. Author: zsxwing <zsxwing@gmail.com> Closes #6599 from zsxwing/SPARK-8057 and squashes the following commits: f7a343c [zsxwing] Remove the redundant import 6b7f1af [zsxwing] Call TaskAttemptContext.getTaskAttemptID using Reflection	2015-08-06 21:42:42 -07:00
Andrew Or	014a9f9d8c	[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort The issue is that a task may run multiple sorts, and the sorts run by the child operator (i.e. parent RDD) may acquire all available memory such that other sorts in the same task do not have enough to proceed. This manifests itself in an `IOException("Unable to acquire X bytes of memory")` thrown by `UnsafeExternalSorter`. The solution is to reserve a page in each sorter in the chain before computing the child operator's (parent RDD's) partitions. This requires us to use a new special RDD that does some preparation before computing the parent's partitions. Author: Andrew Or <andrew@databricks.com> Closes #8011 from andrewor14/unsafe-starve-memory and squashes the following commits: 35b69a4 [Andrew Or] Simplify test 0b07782 [Andrew Or] Minor: update comments 5d5afdf [Andrew Or] Merge branch 'master' of github.com:apache/spark into unsafe-starve-memory 254032e [Andrew Or] Add tests 234acbd [Andrew Or] Reserve a page in sorter when preparing each partition b889e08 [Andrew Or] MapPartitionsWithPreparationRDD	2015-08-06 19:04:57 -07:00
Reynold Xin	b87825310a	[SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator. A small performance optimization – we don't need to generate a Tuple2 and then immediately discard the key. We also don't need an extra wrapper from InterruptibleIterator. Author: Reynold Xin <rxin@databricks.com> Closes #8000 from rxin/SPARK-9692 and squashes the following commits: 1d4d0b3 [Reynold Xin] [SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and InterruptibleIterator.	2015-08-06 18:25:38 -07:00
Marcelo Vanzin	e234ea1b49	[SPARK-9645] [YARN] [CORE] Allow shuffle service to read shuffle files. Spark should not mess with the permissions of directories created by the cluster manager. Here, by setting the block manager dir permissions to 700, the shuffle service (running as the YARN user) wouldn't be able to serve shuffle files created by applications. Also, the code to protect the local app dir was missing in standalone's Worker; that has been now added. Since all processes run as the same user in standalone, `chmod 700` should not cause problems. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7966 from vanzin/SPARK-9645 and squashes the following commits: 6e07b31 [Marcelo Vanzin] Protect the app dir in standalone mode. 384ba6a [Marcelo Vanzin] [SPARK-9645] [yarn] [core] Allow shuffle service to read shuffle files.	2015-08-06 15:30:27 -07:00
Tathagata Das	0a078303d0	[SPARK-9556] [SPARK-9619] [SPARK-9624] [STREAMING] Make BlockGenerator more robust and make all BlockGenerators subscribe to rate limit updates In some receivers, instead of using the default `BlockGenerator` in `ReceiverSupervisorImpl`, custom generator with their custom listeners are used for reliability (see [`ReliableKafkaReceiver`](https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala#L99) and [updated `KinesisReceiver`](https://github.com/apache/spark/pull/7825/files)). These custom generators do not receive rate updates. This PR modifies the code to allow custom `BlockGenerator`s to be created through the `ReceiverSupervisorImpl` so that they can be kept track and rate updates can be applied. In the process, I did some simplification, and de-flaki-fication of some rate controller related tests. In particular. - Renamed `Receiver.executor` to `Receiver.supervisor` (to match `ReceiverSupervisor`) - Made `RateControllerSuite` faster (by increasing batch interval) and less flaky - Changed a few internal API to return the current rate of block generators as Long instead of Option\[Long\] (was inconsistent at places). - Updated existing `ReceiverTrackerSuite` to test that custom block generators get rate updates as well. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #7913 from tdas/SPARK-9556 and squashes the following commits: 41d4461 [Tathagata Das] fix scala style eb9fd59 [Tathagata Das] Updated kinesis receiver d24994d [Tathagata Das] Updated BlockGeneratorSuite to use manual clock in BlockGenerator d70608b [Tathagata Das] Updated BlockGenerator with states and proper synchronization f6bd47e [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-9556 31da173 [Tathagata Das] Fix bug 12116df [Tathagata Das] Add BlockGeneratorSuite 74bd069 [Tathagata Das] Fix style 989bb5c [Tathagata Das] Made BlockGenerator fail is used after stop, and added better unit tests for it 3ff618c [Tathagata Das] Fix test b40eff8 [Tathagata Das] slight refactoring f0df0f1 [Tathagata Das] Scala style fixes 51759cb [Tathagata Das] Refactored rate controller tests and added the ability to update rate of any custom block generator	2015-08-06 14:35:30 -07:00
Liang-Chi Hsieh	21fdfd7d6f	[SPARK-9548][SQL] Add a destructive iterator for BytesToBytesMap This pull request adds a destructive iterator to BytesToBytesMap. When used, the iterator frees pages as it traverses them. This is part of the effort to avoid starving when we have more than one operators that can exhaust memory. This is based on #7924, but fixes a bug there (Don't use destructive iterator in UnsafeKVExternalSorter). Closes #7924. Author: Liang-Chi Hsieh <viirya@appier.com> Author: Reynold Xin <rxin@databricks.com> Closes #8003 from rxin/map-destructive-iterator and squashes the following commits: 6b618c3 [Reynold Xin] Don't use destructive iterator in UnsafeKVExternalSorter. a7bd8ec [Reynold Xin] Merge remote-tracking branch 'viirya/destructive_iter' into map-destructive-iterator 7652083 [Liang-Chi Hsieh] For comments: add destructiveIterator(), modify unit test, remove code block. 4a3e9de [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into destructive_iter 581e9e3 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into destructive_iter f0ff783 [Liang-Chi Hsieh] No need to free last page. 9e9d2a3 [Liang-Chi Hsieh] Add a destructive iterator for BytesToBytesMap.	2015-08-06 14:33:29 -07:00
Yin Huai	4581badbc8	[SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFixedWidthAggregationMap This PR has the following three small fixes. 1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. We will not not spill a InMemorySorter if it is empty. 3. We will not add a SpillReader to a SpillMerger if this SpillReader is empty. JIRA: https://issues.apache.org/jira/browse/SPARK-9611 Author: Yin Huai <yhuai@databricks.com> Closes #7948 from yhuai/unsafeEmptyMap and squashes the following commits: 9727abe [Yin Huai] Address Josh's comments. 34b6f76 [Yin Huai] 1. UnsafeKVExternalSorter does not use 0 as the initialSize to create an UnsafeInMemorySorter if its BytesToBytesMap is empty. 2. Do not spill a InMemorySorter if it is empty. 3. Do not add spill to SpillMerger if this spill is empty.	2015-08-05 19:19:09 -07:00
Marcelo Vanzin	4399b7b090	[SPARK-9651] Fix UnsafeExternalSorterSuite. First, it's probably a bad idea to call generated Scala methods from Java. In this case, the method being called wasn't actually "Utils.createTempDir()", but actually the method that returns the first default argument to the actual createTempDir method, which is just the location of java.io.tmpdir; meaning that all tests in the class were using the same temp dir, and thus affecting each other. Second, spillingOccursInResponseToMemoryPressure was not writing enough records to actually cause a spill. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7970 from vanzin/SPARK-9651 and squashes the following commits: 74d357f [Marcelo Vanzin] Clean up temp dir on test tear down. a64f36a [Marcelo Vanzin] [SPARK-9651] Fix UnsafeExternalSorterSuite.	2015-08-05 17:58:36 -07:00
Andrew Or	5f0fb6466f	[SPARK-9649] Fix flaky test MasterSuite - randomize ports ``` Error Message Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! Stacktrace java.net.BindException: Failed to bind to: /127.0.0.1:7093: Service 'sparkMaster' failed after 16 retries! at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393) at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389) at scala.util.Success$$anonfun$map$1.apply(Try.scala:206) at scala.util.Try$.apply(Try.scala:161) ``` Author: Andrew Or <andrew@databricks.com> Closes #7968 from andrewor14/fix-master-flaky-test and squashes the following commits: fcc42ef [Andrew Or] Randomize port	2015-08-05 14:12:22 -07:00
Josh Rosen	26b06f1c46	[HOTFIX] Add static import to fix build break from #7676 .	2015-08-05 02:40:50 -07:00
zsxwing	1b0317f64c	[SPARK-8861][SPARK-8862][SQL] Add basic instrumentation to each SparkPlan operator and add a new SQL tab This PR includes the following changes: ### SPARK-8862: Add basic instrumentation to each SparkPlan operator A SparkPlan can override `def accumulators: Map[String, Accumulator[_]]` to expose its metrics that can be displayed in UI. The UI will use them to track the updates and show them in the web page in real-time. ### SparkSQLExecution and SQLSparkListener `SparkSQLExecution.withNewExecutionId` will set `spark.sql.execution.id` to the local properties so that we can use it to track all jobs that belong to the same query. SQLSparkListener is a listener to track all accumulator updates of all tasks for a query. It receives them from heartbeats can the UI can query them in real-time. When running a query, `SQLSparkListener.onExecutionStart` will be called. When a query is finished, `SQLSparkListener.onExecutionEnd` will be called. And the Spark jobs with the same execution id will be tracked and stored with this query. `SQLSparkListener` has to store all accumulator updates for tasks separately. When a task fails and starts to retry, we need to drop the old accumulator updates. Because we can not revert our changes to an accumulator, we have to maintain these accumulator updates by ourselves so as to drop accumulator updates for a failed task. ### SPARK-8862: A new SQL tab Includes two pages: #### A page for all DataFrame/SQL queries It will show the running, completed and failed queries in 3 tables. It also displays the jobs and their links for a query in each row. #### A detail page for a DataFrame/SQL query In this page, it also shows the SparkPlan metrics in real-time. Run a long-running query, such as ``` val testData = sc.parallelize((1 to 1000000).map(i => (i, i.toString))).toDF() testData.select($"_1").filter($"_1" < 1000).foreach(_ => Thread.sleep(60)) ``` and you will see the metrics keep updating in real-time. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7774) <!-- Reviewable:end --> Author: zsxwing <zsxwing@gmail.com> Closes #7774 from zsxwing/sql-ui and squashes the following commits: 5a2bc99 [zsxwing] Remove UISeleniumSuite and its dependency 57d4cd2 [zsxwing] Use VisibleForTesting annotation cc1c736 [zsxwing] Add SparkPlan.trackNumOfRowsEnabled to make subclasses easy to track the number of rows; fix the issue that the "save" action cannot collect metrics 3771ab0 [zsxwing] Register SQL metrics accmulators 3a101c0 [zsxwing] Change prepareCalled's type to AtomicBoolean for thread-safety b8d5605 [zsxwing] Make prepare idempotent; call children's prepare in SparkPlan.prepare; change doPrepare to def 4ed11a1 [zsxwing] var -> val 332639c [zsxwing] Ignore UISeleniumSuite and SQLListenerSuite."no memory leak" because of SPARK-9580 bb52359 [zsxwing] Address other commens in SQLListener c4d0f5d [zsxwing] Move newPredicate out of the iterator loop 957473c [zsxwing] Move STATIC_RESOURCE_DIR to object SQLTab 7ab4816 [zsxwing] Make SparkPlan accumulator API private[sql] dae195e [zsxwing] Fix the code style and comments 3a66207 [zsxwing] Ignore irrelevant accumulators b8484a1 [zsxwing] Merge branch 'master' into sql-ui 9406592 [zsxwing] Implement the SparkPlan viz 4ebce68 [zsxwing] Add SparkPlan.prepare to support BroadcastHashJoin to run background work in parallel ca1811f [zsxwing] Merge branch 'master' into sql-ui fef6fc6 [zsxwing] Fix a corner case 25f335c [zsxwing] Fix the code style 6eae828 [zsxwing] SQLSparkListener -> SQLListener; SparkSQLExecutionUIData -> SQLExecutionUIData; SparkSQLExecution -> SQLExecution 822af75 [zsxwing] Add SQLSparkListenerSuite and fix the issue about onExecutionEnd and onJobEnd 6be626f [zsxwing] Add UISeleniumSuite to test UI d02a24d [zsxwing] Make ExecutionPage private 23abf73 [zsxwing] [SPARK-8862][SPARK-8862][SQL] Add basic instrumentation to each SparkPlan operator and add a new SQL tab	2015-08-05 01:51:22 -07:00
Takeshi YAMAMURO	6d8a6e4161	[SPARK-9360] [SQL] Support BinaryType in PrefixComparators for UnsafeExternalSort The current implementation of UnsafeExternalSort uses NoOpPrefixComparator for binary-typed data. So, we need to add BinaryPrefixComparator in PrefixComparators. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #7676 from maropu/BinaryTypePrefixComparator and squashes the following commits: fe6f31b [Takeshi YAMAMURO] Apply comments d943c04 [Takeshi YAMAMURO] Add a codegen'd entry for BinaryType in SortPrefix ecf3ac5 [Takeshi YAMAMURO] Support BinaryType in PrefixComparator	2015-08-05 00:56:35 -07:00
Burak Yavuz	c9a4c36d05	[SPARK-8313] R Spark packages support shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit. For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`. Author: Burak Yavuz <brkyvz@gmail.com> Closes #7139 from brkyvz/r-submit and squashes the following commits: 0de384f [Burak Yavuz] remove unused imports 2 d253708 [Burak Yavuz] removed unused imports 6603d0d [Burak Yavuz] addressed comments 4258ffe [Burak Yavuz] merged master ddfcc06 [Burak Yavuz] added zipping test 3a1be7d [Burak Yavuz] don't zip 77995df [Burak Yavuz] fix URI ac45527 [Burak Yavuz] added zipping of all libs e6bf7b0 [Burak Yavuz] add println ignores 1bc5554 [Burak Yavuz] add assumes for tests 9778e03 [Burak Yavuz] addressed comments b42b300 [Burak Yavuz] merged master ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit d867756 [Burak Yavuz] add apache header eff5ba1 [Burak Yavuz] ready for review 8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit e5b5a06 [Burak Yavuz] added doc bb751ce [Burak Yavuz] fix null bug 0226768 [Burak Yavuz] fixed issues 8810beb [Burak Yavuz] R packages support	2015-08-04 18:20:12 -07:00
CodingCat	9d668b7368	[SPARK-9602] remove "Akka/Actor" words from comments https://issues.apache.org/jira/browse/SPARK-9602 Although we have hidden Akka behind RPC interface, I found that the Akka/Actor-related comments are still spreading everywhere. To make it consistent, we shall remove "actor"/"akka" words from the comments... Author: CodingCat <zhunansjtu@gmail.com> Closes #7936 from CodingCat/SPARK-9602 and squashes the following commits: e8296a3 [CodingCat] remove actor words from comments	2015-08-04 14:54:11 -07:00
Josh Rosen	ab8ee1a3b9	[SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter This patch extends UnsafeExternalSorter to support records larger than the page size. The basic strategy is the same as in #7762: store large records in their own overflow pages. Author: Josh Rosen <joshrosen@databricks.com> Closes #7891 from JoshRosen/large-records-in-sql-sorter and squashes the following commits: 967580b [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-records-in-sql-sorter 948c344 [Josh Rosen] Add large records tests for KV sorter. 3c17288 [Josh Rosen] Combine memory and disk cleanup into general cleanupResources() method 380f217 [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-records-in-sql-sorter 27eafa0 [Josh Rosen] Fix page size in PackedRecordPointerSuite a49baef [Josh Rosen] Address initial round of review comments 3edb931 [Josh Rosen] Remove accidentally-committed debug statements. 2b164e2 [Josh Rosen] Support large records in UnsafeExternalSorter.	2015-08-04 14:42:11 -07:00
Carson Wang	cb7fa0aa93	[SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page Add pagination for the RDD page to avoid unresponsive UI when the number of the RDD partitions is large. Before: ![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png) After: ![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png) Author: Carson Wang <carson.wang@intel.com> Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits: 03c7168 [Carson Wang] Fix style issues 612c18c [Carson Wang] RDD partition table pagination for the RDD Page	2015-08-04 22:12:30 +09:00
Sean Owen	76d74090d6	[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process. I'll explain several of the changes inline in comments. Author: Sean Owen <sowen@cloudera.com> Closes #7862 from srowen/SPARK-9534 and squashes the following commits: ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build warnings. In a few cases, touch up surrounding code in the process.	2015-08-04 12:02:26 +01:00
Reynold Xin	5eb89f67e3	[SPARK-9577][SQL] Surface concrete iterator types in various sort classes. We often return abstract iterator types in various sort-related classes (e.g. UnsafeKVExternalSorter). It is actually better to return a more concrete type, so the callsite uses that type and JIT can inline the iterator calls. Author: Reynold Xin <rxin@databricks.com> Closes #7911 from rxin/surface-concrete-type and squashes the following commits: 0422add [Reynold Xin] [SPARK-9577][SQL] Surface concrete iterator types in various sort classes.	2015-08-03 18:47:02 -07:00
CodingCat	3b0e44490a	[SPARK-8416] highlight and topping the executor threads in thread dumping page https://issues.apache.org/jira/browse/SPARK-8416 To facilitate debugging, I made this patch with three changes: * render the executor-thread and non executor-thread entries with different background colors * put the executor threads on the top of the list * sort the threads alphabetically Author: CodingCat <zhunansjtu@gmail.com> Closes #7808 from CodingCat/SPARK-8416 and squashes the following commits: 34fc708 [CodingCat] fix className d7b79dd [CodingCat] lowercase threadName d032882 [CodingCat] sort alphabetically and change the css class name f0513b1 [CodingCat] change the color & group threads by name 2da6e06 [CodingCat] small fix 3fc9f36 [CodingCat] define classes in webui.css 8ee125e [CodingCat] highlight and put on top the executor threads in thread dumping page	2015-08-03 18:20:40 -07:00
Burak Yavuz	1633d0a261	[SPARK-9263] Added flags to exclude dependencies when using --packages While the functionality is there to exclude packages, there are no flags that allow users to exclude dependencies, in case of dependency conflicts. We should provide users with a flag to add dependency exclusions in case the packages are not resolved properly (or not available due to licensing). The flag I added was --packages-exclude, but I'm open on renaming it. I also added property flags in case people would like to use a conf file to provide dependencies, which is possible if there is a long list of dependencies or exclusions. cc andrewor14 vanzin pwendell Author: Burak Yavuz <brkyvz@gmail.com> Closes #7599 from brkyvz/packages-exclusions and squashes the following commits: 636f410 [Burak Yavuz] addressed nits 6e54ede [Burak Yavuz] is this the culprit b5e508e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into packages-exclusions 154f5db [Burak Yavuz] addressed initial comments 1536d7a [Burak Yavuz] Added flags to exclude packages using --packages-exclude	2015-08-03 17:42:03 -07:00
Andrew Or	702aa9d7fb	[SPARK-8735] [SQL] Expose memory usage for shuffles, joins and aggregations This patch exposes the memory used by internal data structures on the SparkUI. This tracks memory used by all spilling operations and SQL operators backed by Tungsten, e.g. `BroadcastHashJoin`, `ExternalSort`, `GeneratedAggregate` etc. The metric exposed is "peak execution memory", which broadly refers to the peak in-memory sizes of each of these data structure. A separate patch will extend this by linking the new information to the SQL operators themselves. <img width="950" alt="screen shot 2015-07-29 at 7 43 17 pm" src="https://cloud.githubusercontent.com/assets/2133137/8974776/b90fc980-362a-11e5-9e2b-842da75b1641.png"> <img width="802" alt="screen shot 2015-07-29 at 7 43 05 pm" src="https://cloud.githubusercontent.com/assets/2133137/8974777/baa76492-362a-11e5-9b77-e364a6a6b64e.png"> <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7770) <!-- Reviewable:end --> Author: Andrew Or <andrew@databricks.com> Closes #7770 from andrewor14/expose-memory-metrics and squashes the following commits: 9abecb9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics f5b0d68 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics d7df332 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics 8eefbc5 [Andrew Or] Fix non-failing tests 9de2a12 [Andrew Or] Fix tests due to another logical merge conflict 876bfa4 [Andrew Or] Fix failing test after logical merge conflict 361a359 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics 40b4802 [Andrew Or] Fix style? d0fef87 [Andrew Or] Fix tests? b3b92f6 [Andrew Or] Address comments 0625d73 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics c00a197 [Andrew Or] Fix potential NPEs 10da1cd [Andrew Or] Fix compile 17f4c2d [Andrew Or] Fix compile? a87b4d0 [Andrew Or] Fix compile? d70874d [Andrew Or] Fix test compile + address comments 2840b7d [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics 6aa2f7a [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics b889a68 [Andrew Or] Minor changes: comments, spacing, style 663a303 [Andrew Or] UnsafeShuffleWriter: update peak memory before close d090a94 [Andrew Or] Fix style 2480d84 [Andrew Or] Expand test coverage 5f1235b [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics 1ecf678 [Andrew Or] Minor changes: comments, style, unused imports 0b6926c [Andrew Or] Oops 111a05e [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics a7a39a5 [Andrew Or] Strengthen presence check for accumulator a919eb7 [Andrew Or] Add tests for unsafe shuffle writer 23c845d [Andrew Or] Add tests for SQL operators a757550 [Andrew Or] Address comments b5c51c1 [Andrew Or] Re-enable test in JavaAPISuite 5107691 [Andrew Or] Add tests for internal accumulators 59231e4 [Andrew Or] Fix tests 9528d09 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics 5b5e6f3 [Andrew Or] Add peak execution memory to summary table + tooltip 92b4b6b [Andrew Or] Display peak execution memory on the UI eee5437 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics d9b9015 [Andrew Or] Track execution memory in unsafe shuffles 770ee54 [Andrew Or] Track execution memory in broadcast joins 9c605a4 [Andrew Or] Track execution memory in GeneratedAggregate 9e824f2 [Andrew Or] Add back execution memory tracking for *ExternalSort 4ef4cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into expose-memory-metrics e6c3e2f [Andrew Or] Move internal accumulators creation to Stage a417592 [Andrew Or] Expose memory metrics in UnsafeExternalSorter 3c4f042 [Andrew Or] Track memory usage in ExternalAppendOnlyMap / ExternalSorter bd7ab3f [Andrew Or] Add internal accumulators to TaskContext	2015-08-03 14:22:07 -07:00
Andrew Or	b41a32718d	[SPARK-1855] Local checkpointing Certain use cases of Spark involve RDDs with long lineages that must be truncated periodically (e.g. GraphX). The existing way of doing it is through `rdd.checkpoint()`, which is expensive because it writes to HDFS. This patch provides an alternative to truncate lineages cheaply without providing the same level of fault tolerance. Local checkpointing writes checkpointed data to the local file system through the block manager. It is much faster than replicating to a reliable storage and provides the same semantics as long as executors do not fail. It is accessible through a new operator `rdd.localCheckpoint()` and leaves the old one unchanged. Users may even decide to combine the two and call the reliable one less frequently. The bulk of this patch involves refactoring the checkpointing interface to accept custom implementations of checkpointing. [Design doc](https://issues.apache.org/jira/secure/attachment/12741708/SPARK-7292-design.pdf). Author: Andrew Or <andrew@databricks.com> Closes #7279 from andrewor14/local-checkpoint and squashes the following commits: 729600f [Andrew Or] Oops, fix tests 34bc059 [Andrew Or] Avoid computing all partitions in local checkpoint e43bbb6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 3be5aea [Andrew Or] Address comments bf846a6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint ab003a3 [Andrew Or] Fix compile c2e111b [Andrew Or] Address comments 33f167a [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint e908a42 [Andrew Or] Fix tests f5be0f3 [Andrew Or] Use MEMORY_AND_DISK as the default local checkpoint level a92657d [Andrew Or] Update a few comments e58e3e3 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 4eb6eb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint 1bbe154 [Andrew Or] Simplify LocalCheckpointRDD 48a9996 [Andrew Or] Avoid traversing dependency tree + rewrite tests 62aba3f [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint db70dc2 [Andrew Or] Express local checkpointing through caching the original RDD 87d43c6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into local-checkpoint c449b38 [Andrew Or] Fix style 4a182f3 [Andrew Or] Add fine-grained tests for local checkpointing 53b363b [Andrew Or] Rename a few more awkwardly named methods (minor) e4cf071 [Andrew Or] Simplify LocalCheckpointRDD + docs + clean ups 4880deb [Andrew Or] Fix style d096c67 [Andrew Or] Fix mima 172cb66 [Andrew Or] Fix mima? e53d964 [Andrew Or] Fix style 56831c5 [Andrew Or] Add a few warnings and clear exception messages 2e59646 [Andrew Or] Add local checkpoint clean up tests 4dbbab1 [Andrew Or] Refactor CheckpointSuite to test local checkpointing 4514dc9 [Andrew Or] Clean local checkpoint files through RDD cleanups 0477eec [Andrew Or] Rename a few methods with awkward names (minor) 2e902e5 [Andrew Or] First implementation of local checkpointing 8447454 [Andrew Or] Fix tests 4ac1896 [Andrew Or] Refactor checkpoint interface for modularity	2015-08-03 10:58:37 -07:00
Timothy Chen	95dccc6335	[SPARK-8873] [MESOS] Clean up shuffle files if external shuffle service is used This patch builds directly on #7820, which is largely written by tnachen. The only addition is one commit for cleaning up the code. There should be no functional differences between this and #7820. Author: Timothy Chen <tnachen@gmail.com> Author: Andrew Or <andrew@databricks.com> Closes #7881 from andrewor14/tim-cleanup-mesos-shuffle and squashes the following commits: 8894f7d [Andrew Or] Clean up code 2a5fa10 [Andrew Or] Merge branch 'mesos_shuffle_clean' of github.com:tnachen/spark into tim-cleanup-mesos-shuffle fadff89 [Timothy Chen] Address comments. e4d0f1d [Timothy Chen] Clean up external shuffle data on driver exit with Mesos.	2015-08-03 01:55:58 -07:00
Reynold Xin	2e981b7bfa	[SPARK-9531] [SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter This pull request adds a destructAndCreateExternalSorter method to UnsafeFixedWidthAggregationMap. The new method does the following: 1. Creates a new external sorter UnsafeKVExternalSorter 2. Adds all the data into an in-memory sorter, sorts them 3. Spills the sorted in-memory data to disk This method can be used to fallback to sort-based aggregation when under memory pressure. The pull request also includes accounting fixes from JoshRosen. TODOs (that can be done in follow-up PRs) - [x] Address Josh's feedbacks from #7849 - [x] More documentation and test cases - [x] Make sure we are doing memory accounting correctly with test cases (e.g. did we release the memory in BytesToBytesMap twice?) - [ ] Look harder at possible memory leaks and exception handling - [ ] Randomized tester for the KV sorter as well as the aggregation map Author: Reynold Xin <rxin@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #7860 from rxin/kvsorter and squashes the following commits: 986a58c [Reynold Xin] Bug fix. 599317c [Reynold Xin] Style fix and slightly more compact code. fe7bd4e [Reynold Xin] Bug fixes. fd71bef [Reynold Xin] Merge remote-tracking branch 'josh/large-records-in-sql-sorter' into kvsorter-with-josh-fix 3efae38 [Reynold Xin] More fixes and documentation. 45f1b09 [Josh Rosen] Ensure that spill files are cleaned up f6a9bd3 [Reynold Xin] Josh feedback. 9be8139 [Reynold Xin] Remove testSpillFrequency. 7cbe759 [Reynold Xin] [SPARK-9531][SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter. ae4a8af [Josh Rosen] Detect leaked unsafe memory in UnsafeExternalSorterSuite. 52f9b06 [Josh Rosen] Detect ShuffleMemoryManager leaks in UnsafeExternalSorter.	2015-08-02 12:32:14 -07:00
Reynold Xin	3d1535d488	[SPARK-9520] [SQL] Support in-place sort in UnsafeFixedWidthAggregationMap This pull request adds a sortedIterator method to UnsafeFixedWidthAggregationMap that sorts its data in-place by the grouping key. This is needed so we can fallback to external sorting for aggregation. Author: Reynold Xin <rxin@databricks.com> Closes #7849 from rxin/bytes2bytes-sorting and squashes the following commits: 75018c6 [Reynold Xin] Updated documentation. 81a8694 [Reynold Xin] [SPARK-9520][SQL] Support in-place sort in UnsafeFixedWidthAggregationMap.	2015-08-01 13:20:26 -07:00
Andrew Or	6688ba6e68	[SPARK-4751] Dynamic allocation in standalone mode Dynamic allocation is a feature that allows a Spark application to scale the number of executors up and down dynamically based on the workload. Support was first introduced in YARN since 1.2, and then extended to Mesos coarse-grained mode recently. Today, it is finally supported in standalone mode as well! I tested this locally and it works as expected. This is WIP because unit tests are coming. Author: Andrew Or <andrew@databricks.com> Closes #7532 from andrewor14/standalone-da and squashes the following commits: b3c1736 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da 879e928 [Andrew Or] Add end-to-end tests for standalone dynamic allocation accc8f6 [Andrew Or] Address comments ee686a8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da c0a2c02 [Andrew Or] Fix build after merge conflict 24149eb [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da 2e762d6 [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da 6832bd7 [Andrew Or] Add tests for scheduling with executor limit a82e907 [Andrew Or] Fix comments 0a8be79 [Andrew Or] Simplify logic by removing the worker blacklist b7742af [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da 2eb5f3f [Andrew Or] Merge branch 'master' of github.com:apache/spark into standalone-da 1334e9a [Andrew Or] Fix MiMa 32abe44 [Andrew Or] Fix style 58cb06f [Andrew Or] Privatize worker blacklist for cleanliness 42ac215 [Andrew Or] Clean up comments and rewrite code for readability 49702d1 [Andrew Or] Clean up shuffle files after application exits 80047aa [Andrew Or] First working implementation	2015-08-01 11:57:14 -07:00
Reynold Xin	d90f2cf7a2	[SPARK-9517][SQL] BytesToBytesMap should encode data the same way as UnsafeExternalSorter BytesToBytesMap current encodes key/value data in the following format: ``` 8B key length, key data, 8B value length, value data ``` UnsafeExternalSorter, on the other hand, encodes data this way: ``` 4B record length, data ``` As a result, we cannot pass records encoded by BytesToBytesMap directly into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, we can then pass the key/value records directly into UnsafeExternalSorter: ``` 4B key+value length, 4B key length, key data, value data ``` Author: Reynold Xin <rxin@databricks.com> Closes #7845 from rxin/kvsort-rebase and squashes the following commits: 5716b59 [Reynold Xin] Fixed test. 2e62ccb [Reynold Xin] Updated BytesToBytesMap's data encoding to put the key first. a51b641 [Reynold Xin] Added a KV sorter interface.	2015-07-31 23:55:16 -07:00
Josh Rosen	8cb415a4b9	[SPARK-9451] [SQL] Support entries larger than default page size in BytesToBytesMap & integrate with ShuffleMemoryManager This patch adds support for entries larger than the default page size in BytesToBytesMap. These large rows are handled by allocating special overflow pages to hold individual entries. In addition, this patch integrates BytesToBytesMap with the ShuffleMemoryManager: - Move BytesToBytesMap from `unsafe` to `core` so that it can import `ShuffleMemoryManager`. - Before allocating new data pages, ask the ShuffleMemoryManager to reserve the memory: - `putNewKey()` now returns a boolean to indicate whether the insert succeeded or failed due to a lack of memory. The caller can use this value to respond to the memory pressure (e.g. by spilling). - `UnsafeFixedWidthAggregationMap. getAggregationBuffer()` now returns `null` to signal failure due to a lack of memory. - Updated all uses of these classes to handle these error conditions. - Added new tests for allocating large records and for allocations which fail due to memory pressure. - Extended the `afterAll()` test teardown methods to detect ShuffleMemoryManager leaks. Author: Josh Rosen <joshrosen@databricks.com> Closes #7762 from JoshRosen/large-rows and squashes the following commits: ae7bc56 [Josh Rosen] Fix compilation 82fc657 [Josh Rosen] Merge remote-tracking branch 'origin/master' into large-rows 34ab943 [Josh Rosen] Remove semi 31a525a [Josh Rosen] Integrate BytesToBytesMap with ShuffleMemoryManager. `626b33c` [Josh Rosen] Move code to sql/core and spark/core packages so that ShuffleMemoryManager can be integrated ec4484c [Josh Rosen] Move BytesToBytesMap from unsafe package to core. 642ed69 [Josh Rosen] Rename size to numElements bea1152 [Josh Rosen] Add basic test. 2cd3570 [Josh Rosen] Remove accidental duplicated code 07ff9ef [Josh Rosen] Basic support for large rows in BytesToBytesMap.	2015-07-31 19:19:27 -07:00
Sameer Abhyankar	060c79aab5	[SPARK-9056] [STREAMING] Rename configuration `spark.streaming.minRememberDuration` to `spark.streaming.fileStream.minRememberDuration` Rename configuration `spark.streaming.minRememberDuration` to `spark.streaming.fileStream.minRememberDuration` Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.local> Author: Sameer Abhyankar <sabhyankar@sabhyankar-MBP.Samavihome> Closes #7740 from sabhyankar/spark_branch_9056 and squashes the following commits: d5b2f1f [Sameer Abhyankar] Correct deprecated version to 1.5 1268133 [Sameer Abhyankar] Add {} and indentation ddf9844 [Sameer Abhyankar] Change 4 space indentation to 2 space indentation 1819b5f [Sameer Abhyankar] Use spark.streaming.fileStream.minRememberDuration property in lieu of spark.streaming.minRememberDuration	2015-07-31 13:08:55 -07:00
CodingCat	c0686668ae	[SPARK-9202] capping maximum number of executor&driver information kept in Worker https://issues.apache.org/jira/browse/SPARK-9202 Author: CodingCat <zhunansjtu@gmail.com> Closes #7714 from CodingCat/SPARK-9202 and squashes the following commits: 23977fb [CodingCat] add comments about why we don't synchronize finishedExecutors & finishedDrivers dc9772d [CodingCat] addressing the comments e125241 [CodingCat] stylistic fix 80bfe52 [CodingCat] fix JsonProtocolSuite d7d9485 [CodingCat] styistic fix and respect insert ordering 031755f [CodingCat] add license info & stylistic fix c3b5361 [CodingCat] test cases and docs c557b3a [CodingCat] applications are fine 9cac751 [CodingCat] application is fine... ad87ed7 [CodingCat] trimFinishedExecutorsAndDrivers	2015-07-31 20:27:00 +01:00
tedyu	27ae851ce1	[SPARK-9446] Clear Active SparkContext in stop() method In thread 'stopped SparkContext remaining active' on mailing list, Andres observed the following in driver log: ``` 15/07/29 15:17:09 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster has disassociated: <address removed> 15/07/29 15:17:09 INFO YarnClientSchedulerBackend: Shutting down all executors Exception in thread "Yarn application state monitor" org.apache.spark.SparkException: Error asking standalone scheduler to shut down executors at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:261) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stop(CoarseGrainedSchedulerBackend.scala:266) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:158) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1411) at org.apache.spark.SparkContext.stop(SparkContext.scala:1644) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:139) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190)15/07/29 15:17:09 INFO YarnClientSchedulerBackend: Asking each executor to shut down at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257) ... 6 more ``` Effect of the above exception is that a stopped SparkContext is returned to user since SparkContext.clearActiveContext() is not called. Author: tedyu <yuzhihong@gmail.com> Closes #7756 from tedyu/master and squashes the following commits: 7339ff2 [tedyu] Move null assignment out of tryLogNonFatalError block 6e02cd9 [tedyu] Use Utils.tryLogNonFatalError to guard resource release f5fb519 [tedyu] Clear Active SparkContext in stop() method using finally	2015-07-31 18:16:55 +01:00
zsxwing	04a49edfdb	[SPARK-9497] [SPARK-9509] [CORE] Use ask instead of askWithRetry `RpcEndpointRef.askWithRetry` throws `SparkException` rather than `TimeoutException`. Use ask to replace it because we don't need to retry here. Author: zsxwing <zsxwing@gmail.com> Closes #7824 from zsxwing/SPARK-9497 and squashes the following commits: 7bfc2b4 [zsxwing] Use ask instead of askWithRetry	2015-07-31 09:34:16 -07:00
Reynold Xin	e7a0976e99	[SPARK-9458][SPARK-9469][SQL] Code generate prefix computation in sorting & moves unsafe conversion out of TungstenSort. Author: Reynold Xin <rxin@databricks.com> Closes #7803 from rxin/SPARK-9458 and squashes the following commits: 5b032dc [Reynold Xin] Fix string. b670dbb [Reynold Xin] [SPARK-9458][SPARK-9469][SQL] Code generate prefix computation in sorting & moves unsafe conversion out of TungstenSort.	2015-07-30 17:17:27 -07:00
Hossein	157840d1b1	[SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame API This patch improves SparkR error message reporting, especially with DataFrame API. When there is a user error (e.g., malformed SQL query), the message of the cause is sent back through the RPC and the R client reads it and returns it back to user. cc shivaram Author: Hossein <hossein@databricks.com> Closes #7742 from falaki/SPARK-8742 and squashes the following commits: 4f643c9 [Hossein] Not logging exceptions in RBackendHandler 4a8005c [Hossein] Returning stack track of causing exception from RBackendHandler 5cf17f0 [Hossein] Adding unit test for error messages from SQLContext 2af75d5 [Hossein] Reading error message in case of failure and stoping with that message f479c99 [Hossein] Wrting exception cause message in JVM	2015-07-30 16:16:17 -07:00
Imran Rashid	06b6a074fb	[SPARK-9437] [CORE] avoid overflow in SizeEstimator https://issues.apache.org/jira/browse/SPARK-9437 Author: Imran Rashid <irashid@cloudera.com> Closes #7750 from squito/SPARK-9437_size_estimator_overflow and squashes the following commits: 29493f1 [Imran Rashid] prevent another potential overflow bc1cb82 [Imran Rashid] avoid overflow	2015-07-30 10:46:26 -07:00
Josh Rosen	520ec0ff9d	[SPARK-8850] [SQL] Enable Unsafe mode by default This pull request enables Unsafe mode by default in Spark SQL. In order to do this, we had to fix a number of small issues: List of fixed blockers: - [x] Make some default buffer sizes configurable so that HiveCompatibilitySuite can run properly (#7741). - [x] Memory leak on grouped aggregation of empty input (fixed by #7560 to fix this) - [x] Update planner to also check whether codegen is enabled before planning unsafe operators. - [x] Investigate failing HiveThriftBinaryServerSuite test. This turns out to be caused by a ClassCastException that occurs when Exchange tries to apply an interpreted RowOrdering to an UnsafeRow when range partitioning an RDD. This could be fixed by #7408, but a shorter-term fix is to just skip the Unsafe exchange path when RangePartitioner is used. - [x] Memory leak exceptions masking exceptions that actually caused tasks to fail (will be fixed by #7603). - [x] ~~https://issues.apache.org/jira/browse/SPARK-9162, to implement code generation for ScalaUDF. This is necessary for `UDFSuite` to pass. For now, I've just ignored this test in order to try to find other problems while we wait for a fix.~~ This is no longer necessary as of #7682. - [x] Memory leaks from Limit after UnsafeExternalSort cause the memory leak detector to fail tests. This is a huge problem in the HiveCompatibilitySuite (fixed by f4ac642a4e5b2a7931c5e04e086bb10e263b1db6). - [x] Tests in `AggregationQuerySuite` are failing due to NaN-handling issues in UnsafeRow, which were fixed in #7736. - [x] `org.apache.spark.sql.ColumnExpressionSuite.rand` needs to be updated so that the planner check also matches `TungstenProject`. - [x] After having lowered the buffer sizes to 4MB so that most of HiveCompatibilitySuite runs: - [x] Wrong answer in `join_1to1` (fixed by #7680) - [x] Wrong answer in `join_nulls` (fixed by #7680) - [x] Managed memory OOM / leak in `lateral_view` - [x] Seems to hang indefinitely in `partcols1`. This might be a deadlock in script transformation or a bug in error-handling code? The hang was fixed by #7710. - [x] Error while freeing memory in `partcols1`: will be fixed by #7734. - [x] After fixing the `partcols1` hang, it appears that a number of later tests have issues as well. - [x] Fix thread-safety bug in codegen fallback expression evaluation (#7759). Author: Josh Rosen <joshrosen@databricks.com> Closes #7564 from JoshRosen/unsafe-by-default and squashes the following commits: 83c0c56 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-by-default f4cc859 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-by-default 963f567 [Josh Rosen] Reduce buffer size for R tests d6986de [Josh Rosen] Lower page size in PySpark tests 013b9da [Josh Rosen] Also match TungstenProject in checkNumProjects 5d0b2d3 [Josh Rosen] Add task completion callback to avoid leak in limit after sort ea250da [Josh Rosen] Disable unsafe Exchange path when RangePartitioning is used 715517b [Josh Rosen] Enable Unsafe by default	2015-07-30 10:45:32 -07:00
Mridul Muralidharan	e53534655d	[SPARK-8297] [YARN] Scheduler backend is not notified in case node fails in YARN This change adds code to notify the scheduler backend when a container dies in YARN. Author: Mridul Muralidharan <mridulm@yahoo-inc.com> Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7431 from vanzin/SPARK-8297 and squashes the following commits: 471e4a0 [Marcelo Vanzin] Fix unit test after merge. d4adf4e [Marcelo Vanzin] Merge branch 'master' into SPARK-8297 3b262e8 [Marcelo Vanzin] Merge branch 'master' into SPARK-8297 537da6f [Marcelo Vanzin] Make an expected log less scary. 04dc112 [Marcelo Vanzin] Use driver <-> AM communication to send "remove executor" request. 8855b97 [Marcelo Vanzin] Merge remote-tracking branch 'mridul/fix_yarn_scheduler_bug' into SPARK-8297 687790f [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug e1b0067 [Mridul Muralidharan] Fix failing testcase, fix merge issue from our 1.3 -> master 9218fcc [Mridul Muralidharan] Fix failing testcase 362d64a [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug 62ad0cc [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug bbf8811 [Mridul Muralidharan] Merge branch 'fix_yarn_scheduler_bug' of github.com:mridulm/spark into fix_yarn_scheduler_bug 9ee1307 [Mridul Muralidharan] Fix SPARK-8297 a3a0f01 [Mridul Muralidharan] Fix SPARK-8297	2015-07-30 10:37:53 -07:00
François Garillot	7bbf02f0bd	[SPARK-9267] [CORE] Retire stringify(Partial)?Value from Accumulators cc srowen Author: François Garillot <francois@garillot.net> Closes #7678 from huitseeker/master and squashes the following commits: 5e99f57 [François Garillot] [SPARK-9267][Core] Retire stringify(Partial)?Value from Accumulators	2015-07-30 18:14:08 +01:00
Reynold Xin	4a8bb9d00d	Revert "[SPARK-9458] Avoid object allocation in prefix generation." This reverts commit `9514d874f0`.	2015-07-30 01:04:24 -07:00
Joseph Batchik	1221849f91	[SPARK-8005][SQL] Input file name Users can now get the file name of the partition being read in. A thread local variable is in `SQLNewHadoopRDD` and is set when the partition is computed. `SQLNewHadoopRDD` is moved to core so that the catalyst package can reach it. This supports: `df.select(inputFileName())` and `sqlContext.sql("select input_file_name() from table")` Author: Joseph Batchik <josephbatchik@gmail.com> Closes #7743 from JDrit/input_file_name and squashes the following commits: abb8609 [Joseph Batchik] fixed failing test and changed the default value to be an empty string d2f323d [Joseph Batchik] updates per review 102061f [Joseph Batchik] updates per review 75313f5 [Joseph Batchik] small fixes c7f7b5a [Joseph Batchik] addeding input file name to Spark SQL	2015-07-29 23:35:55 -07:00
Reynold Xin	07fd7d3647	[SPARK-9460] Avoid byte array allocation in StringPrefixComparator. As of today, StringPrefixComparator converts the long values back to byte arrays in order to compare them. This patch optimizes this to compare the longs directly, rather than turning the longs into byte arrays and comparing them byte by byte (unsigned). This only works on little-endian architecture right now. Author: Reynold Xin <rxin@databricks.com> Closes #7765 from rxin/SPARK-9460 and squashes the following commits: e4908cc [Reynold Xin] Stricter randomized tests. 4c8d094 [Reynold Xin] [SPARK-9460] Avoid byte array allocation in StringPrefixComparator.	2015-07-29 21:18:43 -07:00
Reynold Xin	9514d874f0	[SPARK-9458] Avoid object allocation in prefix generation. In our existing sort prefix generation code, we use expression's eval method to generate the prefix, which results in object allocation for every prefix. We can use the specialized getters available on InternalRow directly to avoid the object allocation. I also removed the FLOAT prefix, opting for converting float directly to double. Author: Reynold Xin <rxin@databricks.com> Closes #7763 from rxin/sort-prefix and squashes the following commits: 5dc2f06 [Reynold Xin] [SPARK-9458] Avoid object allocation in prefix generation.	2015-07-29 20:46:03 -07:00
Josh Rosen	1b0099fc62	[SPARK-9411] [SQL] Make Tungsten page sizes configurable We need to make page sizes configurable so we can reduce them in unit tests and increase them in real production workloads. These sizes are now controlled by a new configuration, `spark.buffer.pageSize`. The new default is 64 megabytes. Author: Josh Rosen <joshrosen@databricks.com> Closes #7741 from JoshRosen/SPARK-9411 and squashes the following commits: a43c4db [Josh Rosen] Fix pow 2c0eefc [Josh Rosen] Fix MAXIMUM_PAGE_SIZE_BYTES comment + value bccfb51 [Josh Rosen] Lower page size to 4MB in TestHive ba54d4b [Josh Rosen] Make UnsafeExternalSorter's page size configurable 0045aa2 [Josh Rosen] Make UnsafeShuffle's page size configurable bc734f0 [Josh Rosen] Rename configuration e614858 [Josh Rosen] Makes BytesToBytesMap page size configurable	2015-07-29 16:00:30 -07:00
Joseph Batchik	069a4c414d	[SPARK-746] [CORE] Added Avro Serialization to Kryo Added a custom Kryo serializer for generic Avro records to reduce the network IO involved during a shuffle. This compresses the schema and allows for users to register their schemas ahead of time to further reduce traffic. Currently Kryo tries to use its default serializer for generic Records, which will include a lot of unneeded data in each record. Author: Joseph Batchik <joseph.batchik@cloudera.com> Author: Joseph Batchik <josephbatchik@gmail.com> Closes #7004 from JDrit/Avro_serialization and squashes the following commits: 8158d51 [Joseph Batchik] updated per feedback c0cf329 [Joseph Batchik] implemented @squito suggestion for SparkEnv dd71efe [Joseph Batchik] fixed bug with serializing 1183a48 [Joseph Batchik] updated codec settings fa9298b [Joseph Batchik] forgot a couple of fixes c5fe794 [Joseph Batchik] implemented @squito suggestion 0f5471a [Joseph Batchik] implemented @squito suggestion to use a codec that is already in spark 6d1925c [Joseph Batchik] fixed to changes suggested by @squito d421bf5 [Joseph Batchik] updated pom to removed versions ab46d10 [Joseph Batchik] Changed Avro dependency to be similar to parent f4ae251 [Joseph Batchik] fixed serialization error in that SparkConf cannot be serialized 2b545cc [Joseph Batchik] started working on fixes for pr 97fba62 [Joseph Batchik] Added a custom Kryo serializer for generic Avro records to reduce the network IO involved during a shuffle. This compresses the schema and allows for users to register their schemas ahead of time to further reduce traffic.	2015-07-29 14:02:32 -05:00
Josh Rosen	ea49705bd4	[SPARK-9419] ShuffleMemoryManager and MemoryStore should track memory on a per-task, not per-thread, basis Spark's ShuffleMemoryManager and MemoryStore track memory on a per-thread basis, which causes problems in the handful of cases where we have tasks that use multiple threads. In PythonRDD, RRDD, ScriptTransformation, and PipedRDD we consume the input iterator in a separate thread in order to write it to an external process. As a result, these RDD's input iterators are consumed in a different thread than the thread that created them, which can cause problems in our memory allocation tracking. For example, if allocations are performed in one thread but deallocations are performed in a separate thread then memory may be leaked or we may get errors complaining that more memory was allocated than was freed. I think that the right way to fix this is to change our accounting to be performed on a per-task instead of per-thread basis. Note that the current per-thread tracking has caused problems in the past; SPARK-3731 (#2668) fixes a memory leak in PythonRDD that was caused by this issue (that fix is no longer necessary as of this patch). Author: Josh Rosen <joshrosen@databricks.com> Closes #7734 from JoshRosen/memory-tracking-fixes and squashes the following commits: b4b1702 [Josh Rosen] Propagate TaskContext to writer threads. 57c9b4e [Josh Rosen] Merge remote-tracking branch 'origin/master' into memory-tracking-fixes ed25d3b [Josh Rosen] Address minor PR review comments 44f6497 [Josh Rosen] Fix long line. 7b0f04b [Josh Rosen] Fix ShuffleMemoryManagerSuite f57f3f2 [Josh Rosen] More thread -> task changes fa78ee8 [Josh Rosen] Move Executor's cleanup into Task so that TaskContext is defined when cleanup is performed 5e2f01e [Josh Rosen] Fix capitalization 1b0083b [Josh Rosen] Roll back fix in PySpark, which is no longer necessary 2e1e0f8 [Josh Rosen] Use TaskAttemptIds to track shuffle memory c9e8e54 [Josh Rosen] Use TaskAttemptIds to track unroll memory	2015-07-28 21:53:28 -07:00
jerryshao	ab62595661	[SPARK-4352] [YARN] [WIP] Incorporate locality preferences in dynamic allocation requests Currently there's no locality preference for container request in YARN mode, this will affect the performance if fetching data remotely, so here proposed to add locality in Yarn dynamic allocation mode. Ping sryza, please help to review, thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #6394 from jerryshao/SPARK-4352 and squashes the following commits: d45fecb [jerryshao] Add documents 6c3fe5c [jerryshao] Fix bug 8db6c0e [jerryshao] Further address the comments 2e2b2cb [jerryshao] Fix rebase compiling problem ce5f096 [jerryshao] Fix style issue 7f7df95 [jerryshao] Fix rebase issue 9ca9e07 [jerryshao] Code refactor according to comments d3e4236 [jerryshao] Further address the comments 5e7a593 [jerryshao] Fix bug introduced code rebase 9ca7783 [jerryshao] Style changes 08317f9 [jerryshao] code and comment refines 65b2423 [jerryshao] Further address the comments a27c587 [jerryshao] address the comment 27faabc [jerryshao] redundant code remove 9ce06a1 [jerryshao] refactor the code f5ba27b [jerryshao] Style fix 2c6cc8a [jerryshao] Fix bug and add unit tests 0757335 [jerryshao] Consider the distribution of existed containers to recalculate the new container requests 0ad66ff [jerryshao] Fix compile bugs 1c20381 [jerryshao] Minor fix 5ef2dc8 [jerryshao] Add docs and improve the code 3359814 [jerryshao] Fix rebase and test bugs 0398539 [jerryshao] reinitialize the new implementation 67596d6 [jerryshao] Still fix the code 654e1d2 [jerryshao] Fix some bugs 45b1c89 [jerryshao] Further polish the algorithm dea0152 [jerryshao] Enable node locality information in YarnAllocator 74bbcc6 [jerryshao] Support node locality for dynamic allocation initial commit	2015-07-27 15:46:35 -07:00
Ryan Williams	c0b7df68f8	[SPARK-9366] use task's stageAttemptId in TaskEnd event Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #7681 from ryan-williams/task-stage-attempt and squashes the following commits: d6d5f0f [Ryan Williams] use task's stageAttemptId in TaskEnd event	2015-07-27 12:54:08 -05:00
Josh Rosen	ecad9d4346	[SPARK-9364] Fix array out of bounds and use-after-free bugs in UnsafeExternalSorter This patch fixes two bugs in UnsafeExternalSorter and UnsafeExternalRowSorter: - UnsafeExternalSorter does not properly update freeSpaceInCurrentPage, which can cause it to write past the end of memory pages and trigger segfaults. - UnsafeExternalRowSorter has a use-after-free bug when returning the last row from an iterator. Author: Josh Rosen <joshrosen@databricks.com> Closes #7680 from JoshRosen/SPARK-9364 and squashes the following commits: 590f311 [Josh Rosen] null out row f4cf91d [Josh Rosen] Fix use-after-free bug in UnsafeExternalRowSorter. 8abcf82 [Josh Rosen] Properly decrement freeSpaceInCurrentPage in UnsafeExternalSorter	2015-07-27 09:34:49 -07:00
Kay Ousterhout	6b2baec04f	[SPARK-9326] Close lock file used for file downloads. A lock file is used to ensure multiple executors running on the same machine don't download the same file concurrently. Spark never closes these lock files (releasing the lock does not close the underlying file); this commit fixes that. cc vanzin (looks like you've been involved in various other fixes surrounding these lock files) Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #7650 from kayousterhout/SPARK-9326 and squashes the following commits: 0401bd1 [Kay Ousterhout] Close lock file used for file downloads.	2015-07-26 13:35:16 -07:00
Andrew Or	1cf19760d6	[SPARK-9352] [SPARK-9353] Add tests for standalone scheduling code This also fixes a small issue in the standalone Master that was uncovered by the new tests. For more detail, read the description of SPARK-9353. Author: Andrew Or <andrew@databricks.com> Closes #7668 from andrewor14/standalone-scheduling-tests and squashes the following commits: d852faf [Andrew Or] Add tests + fix scheduling with memory limits	2015-07-26 13:03:13 -07:00
Nishkam Ravi	41a7cdf85d	[SPARK-8881] [SPARK-9260] Fix algorithm for scheduling executors on workers Current scheduling algorithm allocates one core at a time and in doing so ends up ignoring spark.executor.cores. As a result, when spark.cores.max/spark.executor.cores (i.e, num_executors) < num_workers, executors are not launched and the app hangs. This PR fixes and refactors the scheduling algorithm. andrewor14 Author: Nishkam Ravi <nravi@cloudera.com> Author: nishkamravi2 <nishkamravi@gmail.com> Closes #7274 from nishkamravi2/master_scheduler and squashes the following commits: b998097 [nishkamravi2] Update Master.scala da0f491 [Nishkam Ravi] Update Master.scala 79084e8 [Nishkam Ravi] Update Master.scala 1daf25f [Nishkam Ravi] Update Master.scala f279cdf [Nishkam Ravi] Update Master.scala adec84b [Nishkam Ravi] Update Master.scala a06da76 [nishkamravi2] Update Master.scala 40c8f9f [nishkamravi2] Update Master.scala (to trigger retest) c11c689 [nishkamravi2] Update EventLoggingListenerSuite.scala 5d6a19c [nishkamravi2] Update Master.scala (for the purpose of issuing a retest) 2d6371c [Nishkam Ravi] Update Master.scala 66362d5 [nishkamravi2] Update Master.scala ee7cf0e [Nishkam Ravi] Improved scheduling algorithm for executors	2015-07-25 22:56:25 -07:00
Liang-Chi Hsieh	64135cbb33	[SPARK-9067] [SQL] Close reader in NewHadoopRDD early if there is no more data JIRA: https://issues.apache.org/jira/browse/SPARK-9067 According to the description of the JIRA ticket, calling `reader.close()` only after the task is finished will cause memory and file open limit problem since these resources are occupied even we don't need that anymore. This PR simply closes the reader early when we know there is no more data to read. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7424 from viirya/close_reader and squashes the following commits: 3ff64e5 [Liang-Chi Hsieh] For comments. 3d20267 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader e152182 [Liang-Chi Hsieh] For comments. 5116cbe [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 3ceb755 [Liang-Chi Hsieh] For comments. e34d98e [Liang-Chi Hsieh] For comments. 50ed729 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 216912f [Liang-Chi Hsieh] Fix it. f429016 [Liang-Chi Hsieh] Release reader if we don't need it. a305621 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into close_reader 67569da [Liang-Chi Hsieh] Close reader early if there is no more data.	2015-07-24 12:36:44 -07:00
Marcelo Vanzin	8399ba1487	[SPARK-9261] [STREAMING] Avoid calling APIs that expose shaded classes. Doing this may cause weird errors when tests are run on maven, depending on the flags used. Instead, expose the needed functionality through methods that do not expose shaded classes. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #7601 from vanzin/SPARK-9261 and squashes the following commits: 4f64a16 [Marcelo Vanzin] [SPARK-9261] [streaming] Avoid calling APIs that expose shaded classes.	2015-07-24 11:53:16 -07:00
Reynold Xin	c8d71a4183	[SPARK-9305] Rename org.apache.spark.Row to Item. It's a thing used in test cases, but named Row. Pretty annoying because everytime I search for Row, it shows up before the Spark SQL Row, which is what a developer wants most of the time. Author: Reynold Xin <rxin@databricks.com> Closes #7638 from rxin/remove-row and squashes the following commits: aeda52d [Reynold Xin] [SPARK-9305] Rename org.apache.spark.Row to Item.	2015-07-24 09:38:13 -07:00
François Garillot	6cd28cc21e	[SPARK-9236] [CORE] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions See also comments on https://issues.apache.org/jira/browse/SPARK-9236 Author: François Garillot <francois@garillot.net> Closes #7616 from huitseeker/issue/SPARK-9236 and squashes the following commits: 217f902 [François Garillot] [SPARK-9236] Make defaultPartitioner not reuse a parent RDD's partitioner if it has 0 partitions	2015-07-24 15:41:13 +01:00
Yijie Shen	d2666a3c70	[SPARK-9183] confusing error message when looking up missing function in Spark SQL JIRA: https://issues.apache.org/jira/browse/SPARK-9183 cc rxin Author: Yijie Shen <henry.yijieshen@gmail.com> Closes #7613 from yjshen/npe_udf and squashes the following commits: 44f58f2 [Yijie Shen] add jira ticket number 903c963 [Yijie Shen] add explanation comments f44dd3c [Yijie Shen] Change two hive class LogLevel to avoid annoying messages	2015-07-23 10:31:12 -07:00
Cheng Hao	19aeab57c1	[Build][Minor] Fix building error & performance 1. When build the latest code with sbt, it throws exception like: [error] /home/hcheng/git/catalyst/core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala:78: match may not be exhaustive. [error] It would fail on the following input: UNKNOWN [error] val classNameByStatus = status match { [error] 2. Potential performance issue when implicitly convert an Array[Any] to Seq[Any] Author: Cheng Hao <hao.cheng@intel.com> Closes #7611 from chenghao-intel/toseq and squashes the following commits: cab75c5 [Cheng Hao] remove the toArray 24df682 [Cheng Hao] fix building error & performance	2015-07-23 10:28:20 -07:00
Josh Rosen	ac3ae0f2be	[SPARK-9266] Prevent "managed memory leak detected" exception from masking original exception When a task fails with an exception and also fails to properly clean up its managed memory, the `spark.unsafe.exceptionOnMemoryLeak` memory leak detection mechanism's exceptions will mask the original exception that caused the task to fail. We should throw the memory leak exception only if no other exception occurred. Author: Josh Rosen <joshrosen@databricks.com> Closes #7603 from JoshRosen/SPARK-9266 and squashes the following commits: c268cb5 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-9266 c1f0167 [Josh Rosen] Fix the error masking problem 448eae8 [Josh Rosen] Add regression test	2015-07-23 00:43:26 -07:00
Perinkulam I. Ganesh	b983d493b4	[SPARK-8695] [CORE] [MLLIB] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time. Author: Perinkulam I. Ganesh <gip@us.ibm.com> Closes #7397 from piganesh/SPARK-8695 and squashes the following commits: 041620c [Perinkulam I. Ganesh] [SPARK-8695][CORE][MLlib] TreeAggregation shouldn't be triggered when it doesn't save wall-clock time. 9ad067c [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions a6fed07 [Perinkulam I. Ganesh] [SPARK-8695] [core] [WIP] TreeAggregation shouldn't be triggered for 5 partitions	2015-07-23 07:47:42 +01:00
Josh Rosen	b217230f2a	[SPARK-9144] Remove DAGScheduler.runLocallyWithinThread and spark.localExecution.enabled Spark has an option called spark.localExecution.enabled; according to the docs: > Enables Spark to run certain jobs, such as first() or take() on the driver, without sending tasks to the cluster. This can make certain jobs execute very quickly, but may require shipping a whole partition of data to the driver. This feature ends up adding quite a bit of complexity to DAGScheduler, especially in the runLocallyWithinThread method, but as far as I know nobody uses this feature (I searched the mailing list and haven't seen any recent mentions of the configuration nor stacktraces including the runLocally method). As a step towards scheduler complexity reduction, I propose that we remove this feature and all code related to it for Spark 1.5. This pull request simply brings #7484 up to date. Author: Josh Rosen <joshrosen@databricks.com> Author: Reynold Xin <rxin@databricks.com> Closes #7585 from rxin/remove-local-exec and squashes the following commits: 84bd10e [Reynold Xin] Python fix. 1d9739a [Reynold Xin] Merge pull request #7484 from JoshRosen/remove-localexecution eec39fa [Josh Rosen] Remove allowLocal(); deprecate user-facing uses of it. b0835dc [Josh Rosen] Remove local execution code in DAGScheduler 8975d96 [Josh Rosen] Remove local execution tests. ffa8c9b [Josh Rosen] Remove documentation for configuration	2015-07-22 21:04:04 -07:00
Reynold Xin	d71a13f475	[SPARK-9262][build] Treat Scala compiler warnings as errors I've seen a few cases in the past few weeks that the compiler is throwing warnings that are caused by legitimate bugs. This patch upgrades warnings to errors, except deprecation warnings. Note that ideally we should be able to mark deprecation warnings as errors as well. However, due to the lack of ability to suppress individual warning messages in the Scala compiler, we cannot do that (since we do need to access deprecated APIs in Hadoop). Most of the work are done by ericl. Author: Reynold Xin <rxin@databricks.com> Author: Eric Liang <ekl@databricks.com> Closes #7598 from rxin/warnings and squashes the following commits: beb311b [Reynold Xin] Fixed tests. 542c031 [Reynold Xin] Fixed one more warning. 87c354a [Reynold Xin] Fixed all non-deprecation warnings. 78660ac [Eric Liang] first effort to fix warnings	2015-07-22 21:02:19 -07:00
Matei Zaharia	fe26584a1f	[SPARK-9244] Increase some memory defaults There are a few memory limits that people hit often and that we could make higher, especially now that memory sizes have grown. - spark.akka.frameSize: This defaults at 10 but is often hit for map output statuses in large shuffles. This memory is not fully allocated up-front, so we can just make this larger and still not affect jobs that never sent a status that large. We increase it to 128. - spark.executor.memory: Defaults at 512m, which is really small. We increase it to 1g. Author: Matei Zaharia <matei@databricks.com> Closes #7586 from mateiz/configs and squashes the following commits: ce0038a [Matei Zaharia] [SPARK-9244] Increase some memory defaults	2015-07-22 15:28:09 -07:00
zsxwing	d45355ee22	[SPARK-5423] [CORE] Register a TaskCompletionListener to make sure release all resources Make `DiskMapIterator.cleanup` idempotent and register a TaskCompletionListener to make sure call `cleanup`. Author: zsxwing <zsxwing@gmail.com> Closes #7529 from zsxwing/SPARK-5423 and squashes the following commits: 3e3c413 [zsxwing] Remove TODO 9556c78 [zsxwing] Fix NullPointerException for tests 3d574d9 [zsxwing] Register a TaskCompletionListener to make sure release all resources	2015-07-21 09:55:42 -07:00
zsxwing	4f7f1ee378	[SPARK-4598] [WEBUI] Task table pagination for the Stage page This PR adds pagination for the task table to solve the scalability issue of the stage page. Here is the initial screenshot: <img width="1347" alt="pagination" src="https://cloud.githubusercontent.com/assets/1000778/8679669/9e63863c-2a8e-11e5-94e4-994febcd6717.png"> The task table only shows 100 tasks. There is a page navigation above the table. Users can click the page navigation or type the page number to jump to another page. The table can be sorted by clicking the headers. However, unlike previous implementation, the sorting work is done in the server now. So clicking a table column to sort needs to refresh the web page. Author: zsxwing <zsxwing@gmail.com> Closes #7399 from zsxwing/task-table-pagination and squashes the following commits: 144f513 [zsxwing] Display the page navigation when the page number is out of range a3eee22 [zsxwing] Add extra space for the error message 54c5b84 [zsxwing] Reset page to 1 if the user changes the page size c2f7f39 [zsxwing] Add a text field to let users fill the page size bad52eb [zsxwing] Display user-friendly error messages 410586b [zsxwing] Scroll down to the tasks table if the url contains any sort column a0746d1 [zsxwing] Use expand-dag-viz-arrow-job and expand-dag-viz-arrow-stage instead of expand-dag-viz-arrow-true and expand-dag-viz-arrow-false b123f67 [zsxwing] Use localStorage to remember the user's actions and replay them when loading the page 894a342 [zsxwing] Show the link cursor when hovering for headers and page links and other minor fix 4d4fecf [zsxwing] Address Carson's comments d9285f0 [zsxwing] Add comments and fix the style 74285fa [zsxwing] Merge branch 'master' into task-table-pagination db6c859 [zsxwing] Task table pagination for the Stage page	2015-07-21 09:54:39 -07:00
Jacek Lewandowski	31954910d6	[SPARK-7171] Added a method to retrieve metrics sources in TaskContext Author: Jacek Lewandowski <lewandowski.jacek@gmail.com> Closes #5805 from jacek-lewandowski/SPARK-7171 and squashes the following commits: ed20bda [Jacek Lewandowski] SPARK-7171: Added a method to retrieve metrics sources in TaskContext	2015-07-21 09:53:33 -07:00
Liang-Chi Hsieh	9a4fd875b3	[SPARK-9128] [CORE] Get outerclasses and objects with only one method calling in ClosureCleaner JIRA: https://issues.apache.org/jira/browse/SPARK-9128 Currently, in `ClosureCleaner`, the outerclasses and objects are retrieved using two different methods. However, the logic of the two methods is the same, and we can get both the outerclasses and objects with only one method calling. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7459 from viirya/remove_extra_closurecleaner and squashes the following commits: 7c9858d [Liang-Chi Hsieh] For comments. a096941 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into remove_extra_closurecleaner 2ec5ce1 [Liang-Chi Hsieh] Remove unnecessary methods. 4df5a51 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into remove_extra_closurecleaner dc110d1 [Liang-Chi Hsieh] Add method to get outerclasses and objects at the same time.	2015-07-21 09:52:27 -07:00
Ben	f67da43c39	[SPARK-9036] [CORE] SparkListenerExecutorMetricsUpdate messages not included in JsonProtocol This PR implements a JSON serializer and deserializer in the JSONProtocol to handle the (de)serialization of SparkListenerExecutorMetricsUpdate events. It also includes a unit test in the JSONProtocolSuite file. This was implemented to satisfy the improvement request in the JIRA issue SPARK-9036. Author: Ben <benjaminpiering@gmail.com> Closes #7555 from NamelessAnalyst/master and squashes the following commits: fb4e3cc [Ben] Update JSON Protocol and tests aa69517 [Ben] Update JSON Protocol and tests --Corrected Stage Attempt to Stage Attempt ID 33e5774 [Ben] Update JSON Protocol Tests 3f237e7 [Ben] Update JSON Protocol Tests 84ca798 [Ben] Update JSON Protocol Tests cde57a0 [Ben] Update JSON Protocol Tests 8049600 [Ben] Update JSON Protocol Tests c5bc061 [Ben] Update JSON Protocol Tests 6f25785 [Ben] Merge remote-tracking branch 'origin/master' df2a609 [Ben] Update JSON Protocol dcda80b [Ben] Update JSON Protocol	2015-07-21 09:51:13 -07:00
Grace	6592a6058e	[SPARK-9193] Avoid assigning tasks to "lost" executor(s) Now, when some executors are killed by dynamic-allocation, it leads to some mis-assignment onto lost executors sometimes. Such kind of mis-assignment causes task failure(s) or even job failure if it repeats that errors for 4 times. The root cause is that *killExecutors* doesn't remove those executors under killing ASAP. It depends on the *OnDisassociated* event to refresh the active working list later. The delay time really depends on your cluster status (from several milliseconds to sub-minute). When new tasks to be scheduled during that period of time, it will be assigned to those "active" but "under killing" executors. Then the tasks will be failed due to "executor lost". The better way is to exclude those executors under killing in the makeOffers(). Then all those tasks won't be allocated onto those executors "to be lost" any more. Author: Grace <jie.huang@intel.com> Closes #7528 from GraceH/AssignToLostExecutor and squashes the following commits: ecc1da6 [Grace] scala style fix 6e2ed96 [Grace] Re-word makeOffers by more readable lines b5546ce [Grace] Add comments about the fix 30a9ad0 [Grace] Avoid assigning tasks to lost executors	2015-07-21 11:35:49 -05:00
Kay Ousterhout	6364735bcc	[SPARK-8875] Remove BlockStoreShuffleFetcher class The shuffle code has gotten increasingly difficult to read as it has evolved, and many classes have evolved significantly since they were originally created. The BlockStoreShuffleFetcher class now serves little purpose other than to make the code more difficult to read; this commit moves its functionality into the ShuffleBlockFetcherIterator class. cc massie JoshRosen (Josh, this PR also removes the Try you pointed out as being confusing / not necessarily useful in a previous comment). Matt, would be helpful to know whether this will interfere in any negative ways with your new shuffle PR (I took a look and it seems like this should still cleanly integrate with your parquet work, but want to double check). Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #7268 from kayousterhout/SPARK-8875 and squashes the following commits: 2b24a97 [Kay Ousterhout] Fixed DAGSchedulerSuite compile error 98a1831 [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-8875 90f0e89 [Kay Ousterhout] Fixed broken test 14bfcbb [Kay Ousterhout] Last style fix bc69d2b [Kay Ousterhout] Style improvements based on Josh's code review ad3c8d1 [Kay Ousterhout] Better documentation for MapOutputTracker methods 0bc0e59 [Kay Ousterhout] [SPARK-8875] Remove BlockStoreShuffleFetcher class	2015-07-21 01:12:51 -07:00
Josh Rosen	c032b0bf92	[SPARK-8797] [SPARK-9146] [SPARK-9145] [SPARK-9147] Support NaN ordering and equality comparisons in Spark SQL This patch addresses an issue where queries that sorted float or double columns containing NaN values could fail with "Comparison method violates its general contract!" errors from TimSort. The root of this problem is that `NaN > anything`, `NaN == anything`, and `NaN < anything` all return `false`. Per the design specified in SPARK-9079, we have decided that `NaN = NaN` should return true and that NaN should appear last when sorting in ascending order (i.e. it is larger than any other numeric value). In addition to implementing these semantics, this patch also adds canonicalization of NaN values in UnsafeRow, which is necessary in order to be able to do binary equality comparisons on equal NaNs that might have different bit representations (see SPARK-9147). Author: Josh Rosen <joshrosen@databricks.com> Closes #7194 from JoshRosen/nan and squashes the following commits: 983d4fc [Josh Rosen] Merge remote-tracking branch 'origin/master' into nan 88bd73c [Josh Rosen] Fix Row.equals() a702e2e [Josh Rosen] normalization -> canonicalization a7267cf [Josh Rosen] Normalize NaNs in UnsafeRow fe629ae [Josh Rosen] Merge remote-tracking branch 'origin/master' into nan fbb2a29 [Josh Rosen] Fix NaN comparisons in BinaryComparison expressions c1fd4fe [Josh Rosen] Fold NaN test into existing test framework b31eb19 [Josh Rosen] Uncomment failing tests 7fe67af [Josh Rosen] Support NaN == NaN (SPARK-9145) 58bad2c [Josh Rosen] Revert "Compare rows' string representations to work around NaN incomparability." fc6b4d2 [Josh Rosen] Update CodeGenerator 3998ef2 [Josh Rosen] Remove unused code a2ba2e7 [Josh Rosen] Fix prefix comparision for NaNs a30d371 [Josh Rosen] Compare rows' string representations to work around NaN incomparability. 6f03f85 [Josh Rosen] Fix bug in Double / Float ordering 42a1ad5 [Josh Rosen] Stop filtering NaNs in UnsafeExternalSortSuite bfca524 [Josh Rosen] Change ordering so that NaN is maximum value. 8d7be61 [Josh Rosen] Update randomized test to use ScalaTest's assume() b20837b [Josh Rosen] Add failing test for new NaN comparision ordering 5b88b2b [Josh Rosen] Fix compilation of CodeGenerationSuite d907b5b [Josh Rosen] Merge remote-tracking branch 'origin/master' into nan 630ebc5 [Josh Rosen] Specify an ordering for NaN values. 9bf195a [Josh Rosen] Re-enable NaNs in CodeGenerationSuite to produce more regression tests 13fc06a [Josh Rosen] Add regression test for NaN sorting issue f9efbb5 [Josh Rosen] Fix ORDER BY NULL e7dc4fb [Josh Rosen] Add very generic test for ordering 7d5c13e [Josh Rosen] Add regression test for SPARK-8782 (ORDER BY NULL) b55875a [Josh Rosen] Generate doubles and floats over entire possible range. 5acdd5c [Josh Rosen] Infinity and NaN are interesting. ab76cbd [Josh Rosen] Move code to Catalyst package. d2b4a4a [Josh Rosen] Add random data generator test utilities to Spark SQL.	2015-07-20 22:38:05 -07:00
Carson Wang	66bb8003b9	[SPARK-9187] [WEBUI] Timeline view may show negative value for running tasks For running tasks, the executorRunTime metrics is 0 which causes negative executorComputingTime in the timeline. It also causes an incorrect SchedulerDelay time. ![timelinenegativevalue](https://cloud.githubusercontent.com/assets/9278199/8770953/f4362378-2eec-11e5-81e6-a06a07c04794.png) Author: Carson Wang <carson.wang@intel.com> Closes #7526 from carsonwang/timeline-negValue and squashes the following commits: 7b17db2 [Carson Wang] Fix negative value in timeline view	2015-07-20 18:08:59 -07:00
Cheng Lian	a1064df0ee	[SPARK-8125] [SQL] Accelerates Parquet schema merging and partition discovery This PR tries to accelerate Parquet schema discovery and `HadoopFsRelation` partition discovery. The acceleration is done by the following means: - Turning off schema merging by default Schema merging is not the most common case, but requires reading footers of all Parquet part-files and can be very slow. - Avoiding `FileSystem.globStatus()` call when possible `FileSystem.globStatus()` may issue multiple synchronous RPC calls, and can be very slow (esp. on S3). This PR adds `SparkHadoopUtil.globPathIfNecessary()`, which only issues RPC calls when the path contain glob-pattern specific character(s) (`{}[]*?\`). This is especially useful when converting a metastore Parquet table with lots of partitions, since Spark SQL adds all partition directories as the input paths, and currently we do a `globStatus` call on each input path sequentially. - Listing leaf files in parallel when the number of input paths exceeds a threshold Listing leaf files is required by partition discovery. Currently it is done on driver side, and can be slow when there are lots of (nested) directories, since each `FileSystem.listStatus()` call issues an RPC. In this PR, we list leaf files in a BFS style, and resort to a Spark job once we found that the number of directories need to be listed exceed a threshold. The threshold is controlled by `SQLConf` option `spark.sql.sources.parallelPartitionDiscovery.threshold`, which defaults to 32. - Discovering Parquet schema in parallel Currently, schema merging is also done on driver side, and needs to read footers of all part-files. This PR uses a Spark job to do schema merging. Together with task side metadata reading in Parquet 1.7.0, we never read any footers on driver side now. Author: Cheng Lian <lian@databricks.com> Closes #7396 from liancheng/accel-parquet and squashes the following commits: 5598efc [Cheng Lian] Uses ParquetInputFormat[InternalRow] instead of ParquetInputFormat[Row] ff32cd0 [Cheng Lian] Excludes directories while listing leaf files 3c580f1 [Cheng Lian] Fixes test failure caused by making "mergeSchema" default to "false" b1646aa [Cheng Lian] Should allow empty input paths 32e5f0d [Cheng Lian] Moves schema merging to executor side	2015-07-20 16:42:43 -07:00
Imran Rashid	80e2568b25	[SPARK-8103][core] DAGScheduler should not submit multiple concurrent attempts for a stage https://issues.apache.org/jira/browse/SPARK-8103 cc kayousterhout (thanks for the extra test case) Author: Imran Rashid <irashid@cloudera.com> Author: Kay Ousterhout <kayousterhout@gmail.com> Author: Imran Rashid <squito@users.noreply.github.com> Closes #6750 from squito/SPARK-8103 and squashes the following commits: fb3acfc [Imran Rashid] fix log msg e01b7aa [Imran Rashid] fix some comments, style 584acd4 [Imran Rashid] simplify going from taskId to taskSetMgr e43ac25 [Imran Rashid] Merge branch 'master' into SPARK-8103 6bc23af [Imran Rashid] update log msg 4470fa1 [Imran Rashid] rename c04707e [Imran Rashid] style 88b61cc [Imran Rashid] add tests to make sure that TaskSchedulerImpl schedules correctly with zombie attempts d7f1ef2 [Imran Rashid] get rid of activeTaskSets a21c8b5 [Imran Rashid] Merge branch 'master' into SPARK-8103 906d626 [Imran Rashid] fix merge 109900e [Imran Rashid] Merge branch 'master' into SPARK-8103 c0d4d90 [Imran Rashid] Revert "Index active task sets by stage Id rather than by task set id" f025154 [Imran Rashid] Merge pull request #2 from kayousterhout/imran_SPARK-8103 baf46e1 [Kay Ousterhout] Index active task sets by stage Id rather than by task set id 19685bb [Imran Rashid] switch to using latestInfo.attemptId, and add comments a5f7c8c [Imran Rashid] remove comment for reviewers 227b40d [Imran Rashid] style 517b6e5 [Imran Rashid] get rid of SparkIllegalStateException b2faef5 [Imran Rashid] faster check for conflicting task sets 6542b42 [Imran Rashid] remove extra stageAttemptId ada7726 [Imran Rashid] reviewer feedback d8eb202 [Imran Rashid] Merge branch 'master' into SPARK-8103 46bc26a [Imran Rashid] more cleanup of debug garbage cb245da [Imran Rashid] finally found the issue ... clean up debug stuff 8c29707 [Imran Rashid] Merge branch 'master' into SPARK-8103 89a59b6 [Imran Rashid] more printlns ... 9601b47 [Imran Rashid] more debug printlns ecb4e7d [Imran Rashid] debugging printlns b6bc248 [Imran Rashid] style 55f4a94 [Imran Rashid] get rid of more random test case since kays tests are clearer 7021d28 [Imran Rashid] update test since listenerBus.waitUntilEmpty now throws an exception instead of returning a boolean 883fe49 [Kay Ousterhout] Unit tests for concurrent stages issue 6e14683 [Imran Rashid] unit test just to make sure we fail fast on concurrent attempts 06a0af6 [Imran Rashid] ignore for jenkins c443def [Imran Rashid] better fix and simpler test case 28d70aa [Imran Rashid] wip on getting a better test case ... a9bf31f [Imran Rashid] wip	2015-07-20 10:28:32 -07:00
Wenchen Fan	86c50bf72c	[SPARK-9171][SQL] add and improve tests for nondeterministic expressions Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7496 from cloud-fan/tests and squashes the following commits: 0958f90 [Wenchen Fan] improve test for nondeterministic expressions	2015-07-18 11:58:53 -07:00
Joshi	42d8a012f6	[SPARK-8593] [CORE] Sort app attempts by start time. This makes sure attempts are listed in the order they were executed, and that the app's state matches the state of the most current attempt. Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes #7253 from rekhajoshm/SPARK-8593 and squashes the following commits: 874dd80 [Joshi] History Server: updated order for multiple attempts(logcleaner) 716e0b1 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime) 548c753 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime) 83306a8 [Joshi] History Server: updated order for multiple attempts(descending start time) b0fc922 [Joshi] History Server: updated order for multiple attempts(updated comment) cc0fda7 [Joshi] History Server: updated order for multiple attempts(updated test) 304cb0b [Joshi] History Server: updated order for multiple attempts(reverted HistoryPage) 85024e8 [Joshi] History Server: updated order for multiple attempts a41ac4b [Joshi] History Server: updated order for multiple attempts ab65fa1 [Joshi] History Server: some attempt completed to work with showIncomplete 0be142d [Rekha Joshi] Merge pull request #3 from apache/master 106fd8e [Rekha Joshi] Merge pull request #2 from apache/master e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master	2015-07-17 22:47:28 +01:00
Hari Shreedharan	c043a3e9df	[SPARK-8851] [YARN] In Client mode, make sure the client logs in and updates tokens In client side, the flow is SparkSubmit -> SparkContext -> yarn/Client. Since the yarn client only gets a cloned config and the staging dir is set here, it is not really possible to do re-logins in the SparkContext. So, do the initial logins in Spark Submit and do re-logins as we do now in the AM, but the Client behaves like an executor in this specific context and reads the credentials file to update the tokens. This way, even if the streaming context is started up from checkpoint - it is fine since we have logged in from SparkSubmit itself itself. Author: Hari Shreedharan <hshreedharan@apache.org> Closes #7394 from harishreedharan/yarn-client-login and squashes the following commits: 9a2166f [Hari Shreedharan] make it possible to use command line args and config parameters together. de08f57 [Hari Shreedharan] Fix import order. 5c4fa63 [Hari Shreedharan] Add a comment explaining what is being done in YarnClientSchedulerBackend. c872caa [Hari Shreedharan] Fix typo in log message. 2c80540 [Hari Shreedharan] Move token renewal to YarnClientSchedulerBackend. 0c48ac2 [Hari Shreedharan] Remove direct use of ExecutorDelegationTokenUpdater in Client. 26f8bfa [Hari Shreedharan] [SPARK-8851][YARN] In Client mode, make sure the client logs in and updates tokens. 58b1969 [Hari Shreedharan] Simple attempt 1.	2015-07-17 09:38:08 -05:00
zsxwing	812b63bbee	[SPARK-8857][SPARK-8859][Core]Add an internal flag to Accumulable and send internal accumulator updates to the driver via heartbeats This PR includes the following changes: 1. Remove the thread local `Accumulators.localAccums`. Instead, all Accumulators in the executors will register with its TaskContext. 2. Add an internal flag to Accumulable. For internal Accumulators, their updates will be sent to the driver via heartbeats. Author: zsxwing <zsxwing@gmail.com> Closes #7448 from zsxwing/accumulators and squashes the following commits: c24bc5b [zsxwing] Add comments bd7dcf1 [zsxwing] Add an internal flag to Accumulable and send internal accumulator updates to the driver via heartbeats	2015-07-16 21:09:09 -07:00
Andrew Or	96aa3340f4	[SPARK-8119] HeartbeatReceiver should replace executors, not kill Symptom. If an executor in an application times out, `HeartbeatReceiver` attempts to kill it. After this happens, however, the application never gets an executor back even when there are cluster resources available. Cause. The issue is that `sc.killExecutor` automatically assumes that the application wishes to adjust its resource requirements permanently downwards. This is not the intention in `HeartbeatReceiver`, however, which simply wants a replacement for the expired executor. Fix. Differentiate between the intention to kill and the intention to replace an executor with a fresh one. More details can be found in the commit message. Author: Andrew Or <andrew@databricks.com> Closes #7107 from andrewor14/heartbeat-no-kill and squashes the following commits: 1cd2cd7 [Andrew Or] Add regression test for SPARK-8119 25a347d [Andrew Or] Reuse more code in scheduler backend 31ebd40 [Andrew Or] Differentiate between kill and replace	2015-07-16 19:39:54 -07:00
Timothy Chen	d86bbb4e28	[SPARK-6284] [MESOS] Add mesos role, principal and secret Mesos supports framework authentication and role to be set per framework, which the role is used to identify the framework's role which impacts the sharing weight of resource allocation and optional authentication information to allow the framework to be connected to the master. Author: Timothy Chen <tnachen@gmail.com> Closes #4960 from tnachen/mesos_fw_auth and squashes the following commits: 0f9f03e [Timothy Chen] Fix review comments. 8f9488a [Timothy Chen] Fix rebase f7fc2a9 [Timothy Chen] Add mesos role, auth and secret.	2015-07-16 19:37:15 -07:00
Aaron Davidson	57e9b13bf9	[SPARK-8644] Include call site in SparkException stack traces thrown by job failures Example exception (new part at bottom, clearly demarcated): ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.RuntimeException: uh-oh! at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:880) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38$$anonfun$apply$mcJ$sp$2.apply(DAGSchedulerSuite.scala:880) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1640) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1777) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1777) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1298) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1289) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1288) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1288) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:755) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:755) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:755) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1470) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1459) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:560) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1744) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1762) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1777) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1791) at org.apache.spark.rdd.RDD.count(RDD.scala:1099) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply$mcJ$sp(DAGSchedulerSuite.scala:880) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply(DAGSchedulerSuite.scala:880) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37$$anonfun$38.apply(DAGSchedulerSuite.scala:880) at org.scalatest.Assertions$class.intercept(Assertions.scala:997) at org.scalatest.FunSuite.intercept(FunSuite.scala:1555) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply$mcV$sp(DAGSchedulerSuite.scala:879) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply(DAGSchedulerSuite.scala:878) at org.apache.spark.scheduler.DAGSchedulerSuite$$anonfun$37.apply(DAGSchedulerSuite.scala:878) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfter$$super$runTest(DAGSchedulerSuite.scala:70) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:70) at org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) at org.apache.spark.scheduler.DAGSchedulerSuite.runTest(DAGSchedulerSuite.scala:70) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfter$$super$run(DAGSchedulerSuite.scala:70) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterAll$$super$run(DAGSchedulerSuite.scala:70) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) at org.apache.spark.scheduler.DAGSchedulerSuite.run(DAGSchedulerSuite.scala:70) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) at sbt.ForkMain$Run$2.call(ForkMain.java:294) at sbt.ForkMain$Run$2.call(ForkMain.java:284) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ``` Author: Aaron Davidson <aaron@databricks.com> Closes #7028 from aarondav/stack-trace and squashes the following commits: 4714664 [Aaron Davidson] [SPARK-8644] Include call site in SparkException stack traces thrown by job failures	2015-07-16 18:14:45 -07:00
Daniel Darabos	011551620f	[SPARK-8893] Add runtime checks against non-positive number of partitions https://issues.apache.org/jira/browse/SPARK-8893 > What does `sc.parallelize(1 to 3).repartition(p).collect` return? I would expect `Array(1, 2, 3)` regardless of `p`. But if `p` < 1, it returns `Array()`. I think instead it should throw an `IllegalArgumentException`. > I think the case is pretty clear for `p` < 0. But the behavior for `p` = 0 is also error prone. In fact that's how I found this strange behavior. I used `rdd.repartition(a/b)` with positive `a` and `b`, but `a/b` was rounded down to zero and the results surprised me. I'd prefer an exception instead of unexpected (corrupt) results. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #7285 from darabos/patch-1 and squashes the following commits: decba82 [Daniel Darabos] Allow repartitioning empty RDDs to zero partitions. 97de852 [Daniel Darabos] Allow zero partition count in HashPartitioner f6ba5fb [Daniel Darabos] Use require() for simpler syntax. d5e3df8 [Daniel Darabos] Require positive number of partitions in HashPartitioner 897c628 [Daniel Darabos] Require positive maxPartitions in CoalescedRDD	2015-07-16 08:16:54 +01:00
KaiXinXiaoLei	674eb2a4c3	[SPARK-8974] Catch exceptions in allocation schedule task. I meet a problem. When I submit some tasks, the thread spark-dynamic-executor-allocation should seed the message about "requestTotalExecutors", and the new executor should start. But I meet a problem about this thread, like: 2015-07-14 19:02:17,461 \| WARN \| [spark-dynamic-executor-allocation] \| Error sending message [message = RequestExecutors(1)] in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:57) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:351) at org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1382) at org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:343) at org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:295) at org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:248) when after some minutes, I find a new ApplicationMaster start, and tasks submitted start to run. The tasks Completed. And after long time (eg, ten minutes), the number of executor does not reduce to zero. I use the default value of "spark.dynamicAllocation.minExecutors". Author: KaiXinXiaoLei <huleilei1@huawei.com> Closes #7352 from KaiXinXiaoLei/dym and squashes the following commits: 3603631 [KaiXinXiaoLei] change logError to logWarning efc4f24 [KaiXinXiaoLei] change file	2015-07-15 22:31:10 +01:00
zsxwing	b9a922e260	[SPARK-6602][Core]Replace Akka Serialization with Spark Serializer Replace Akka Serialization with Spark Serializer and add unit tests. Author: zsxwing <zsxwing@gmail.com> Closes #7159 from zsxwing/remove-akka-serialization and squashes the following commits: fc0fca3 [zsxwing] Merge branch 'master' into remove-akka-serialization cf81a58 [zsxwing] Fix the code style 73251c6 [zsxwing] Add test scope 9ef4af9 [zsxwing] Add AkkaRpcEndpointRef.hashCode 433115c [zsxwing] Remove final be3edb0 [zsxwing] Support deserializing RpcEndpointRef ecec410 [zsxwing] Replace Akka Serialization with Spark Serializer	2015-07-15 14:02:23 -07:00
Wenchen Fan	fa4ec3606a	[SPARK-9020][SQL] Support mutable state in code gen expressions We can keep expressions' mutable states in generated class(like `SpecificProjection`) as member variables, so that we can read and modify them inside codegened expressions. Author: Wenchen Fan <cloud0fan@outlook.com> Closes #7392 from cloud-fan/mutable-state and squashes the following commits: eb3a221 [Wenchen Fan] fix order 73144d8 [Wenchen Fan] naming improvement 318f41d [Wenchen Fan] address more comments d43b65d [Wenchen Fan] address comments fd45c7a [Wenchen Fan] Support mutable state in code gen expressions	2015-07-15 10:31:39 -07:00
Liang-Chi Hsieh	6f6902597d	[SPARK-8840] [SPARKR] Add float coercion on SparkR JIRA: https://issues.apache.org/jira/browse/SPARK-8840 Currently the type coercion rules don't include float type. This PR simply adds it. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #7280 from viirya/add_r_float_coercion and squashes the following commits: c86dc0e [Liang-Chi Hsieh] For comments. dbf0c1b [Liang-Chi Hsieh] Implicitly convert Double to Float based on provided schema. 733015a [Liang-Chi Hsieh] Add test case for DataFrame with float type. 30c2a40 [Liang-Chi Hsieh] Update test case. 52b5294 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into add_r_float_coercion 6f9159d [Liang-Chi Hsieh] Add another test case. 8db3244 [Liang-Chi Hsieh] schema also needs to support float. add test case. 0dcc992 [Liang-Chi Hsieh] Add float coercion on SparkR.	2015-07-15 09:48:33 -07:00
zsxwing	adb33d3665	[SPARK-9012] [WEBUI] Escape Accumulators in the task table If running the following codes, the task table will be broken because accumulators aren't escaped. ``` val a = sc.accumulator(1, "<table>") sc.parallelize(1 to 10).foreach(i => a += i) ``` Before this fix, <img width="1348" alt="screen shot 2015-07-13 at 8 02 44 pm" src="https://cloud.githubusercontent.com/assets/1000778/8649295/b17c491e-299b-11e5-97ee-4e6a64074c4f.png"> After this fix, <img width="1355" alt="screen shot 2015-07-13 at 8 14 32 pm" src="https://cloud.githubusercontent.com/assets/1000778/8649337/f9e9c9ec-299b-11e5-927e-35c0a2f897f5.png"> Author: zsxwing <zsxwing@gmail.com> Closes #7369 from zsxwing/SPARK-9012 and squashes the following commits: a83c9b6 [zsxwing] Escape Accumulators in the task table	2015-07-15 17:30:57 +09:00
jerryshao	bb870e72f4	[SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to decrease the memory usage and GC overhead Hostname in TaskMetrics will be created through deserialization, mostly the number of hostname is only the order of number of cluster node, so adding a cache layer to dedup the object could reduce the memory usage and alleviate GC overhead, especially for long-running and fast job generation applications like Spark Streaming. Author: jerryshao <saisai.shao@intel.com> Author: Saisai Shao <saisai.shao@intel.com> Closes #5064 from jerryshao/SPARK-5523 and squashes the following commits: 3e2412a [jerryshao] Address the comments b092a81 [Saisai Shao] Add a pool to cache the hostname	2015-07-14 19:54:02 -07:00
Josh Rosen	11e5c37286	[SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly. `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits: e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName c0b7885 [Josh Rosen] Hopefully fix the last two cases d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass 046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName 62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion. d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName	2015-07-14 16:08:17 -07:00
zsxwing	fb1d06fc24	[SPARK-4072] [CORE] Display Streaming blocks in Streaming UI Replace #6634 This PR adds `SparkListenerBlockUpdated` to SparkListener so that it can monitor all block update infos that are sent to `BlockManagerMasaterEndpoint`, and also add new tables in the Storage tab to display the stream block infos. ![screen shot 2015-07-01 at 5 19 46 pm](https://cloud.githubusercontent.com/assets/1000778/8451562/c291a6ec-2016-11e5-890d-0afc174e1f8c.png) Author: zsxwing <zsxwing@gmail.com> Closes #6672 from zsxwing/SPARK-4072-2 and squashes the following commits: df2c1d8 [zsxwing] Use xml query to check the xml elements 54d54af [zsxwing] Add unit tests for StoragePage e29fb53 [zsxwing] Update as per TD's comments ccbee07 [zsxwing] Fix the code style 6dc42b4 [zsxwing] Fix the replication level of blocks 450fad1 [zsxwing] Merge branch 'master' into SPARK-4072-2 1e9ef52 [zsxwing] Don't categorize by Executor ID ca0ab69 [zsxwing] Fix the code style 3de2762 [zsxwing] Make object BlockUpdatedInfo private e95b594 [zsxwing] Add 'Aggregated Stream Block Metrics by Executor' table ba5d0d1 [zsxwing] Refactor the unit test to improve the readability 4bbe341 [zsxwing] Revert JsonProtocol and don't log SparkListenerBlockUpdated b464dd1 [zsxwing] Add onBlockUpdated to EventLoggingListener 5ba014c [zsxwing] Fix the code style 0b1e47b [zsxwing] Add a developer api BlockUpdatedInfo 04838a9 [zsxwing] Fix the code style 2baa161 [zsxwing] Add unit tests 80f6c6d [zsxwing] Address comments 797ee4b [zsxwing] Display Streaming blocks in Streaming UI	2015-07-14 13:58:36 -07:00
Josh Rosen	d267c2834a	[SPARK-9031] Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. In order to simplify the code in preparation for other refactorings, I think that we should remove this base class and have only DiskBlockObjectWriter. While at one time we may have planned to have multiple BlockObjectWriter implementations, that doesn't seem to have happened, so the extra abstraction seems unnecessary. Author: Josh Rosen <joshrosen@databricks.com> Closes #7391 from JoshRosen/shuffle-write-interface-refactoring and squashes the following commits: c418e33 [Josh Rosen] Fix compilation 5047995 [Josh Rosen] Fix comments d5dc548 [Josh Rosen] Update references in comments 89dc797 [Josh Rosen] Rename test suite. 5755918 [Josh Rosen] Remove unnecessary val in case class 1607c91 [Josh Rosen] Merge BlockObjectWriter and DiskBlockObjectWriter	2015-07-14 12:56:17 -07:00
Andrew Or	8fb3a65cbb	[SPARK-8911] Fix local mode endless heartbeats As of #7173 we expect executors to properly register with the driver before responding to their heartbeats. This behavior is not matched in local mode. This patch adds the missing event that needs to be posted. Author: Andrew Or <andrew@databricks.com> Closes #7382 from andrewor14/fix-local-heartbeat and squashes the following commits: 1258bdf [Andrew Or] Post ExecutorAdded event to local executor	2015-07-14 12:47:11 -07:00
Carson Wang	5ca26fb64d	[SPARK-8950] [WEBUI] Correct the calculation of SchedulerDelay in StagePage In StagePage, the SchedulerDelay is calculated as totalExecutionTime - executorRunTime - executorOverhead - gettingResultTime. But the totalExecutionTime is calculated in the way that doesn't include the gettingResultTime. Author: Carson Wang <carson.wang@intel.com> Closes #7319 from carsonwang/SchedulerDelayTime and squashes the following commits: f66fb6e [Carson Wang] Update the code style 7d971ae [Carson Wang] Correct the calculation of SchedulerDelay	2015-07-13 11:20:04 -07:00
Sun Rui	7f487c8bde	[SPARK-6797] [SPARKR] Add support for YARN cluster mode. This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required that the SparkR package be installed on each worker node. This PR uses the JDK jar tool to package the SparkR package, because jar is thought to be available on both Linux/Windows platforms where JDK has been installed. This PR does not address the R worker involved in RDD API. Will address it in a separate JIRA issue. This PR does not address SBT build. SparkR installation and packaging by SBT will be addressed in a separate JIRA issue. R/install-dev.bat is not tested. shivaram , Could you help to test it? Author: Sun Rui <rui.sun@intel.com> Closes #6743 from sun-rui/SPARK-6797 and squashes the following commits: ca63c86 [Sun Rui] Adjust MimaExcludes after rebase. 7313374 [Sun Rui] Fix unit test errors. 72695fb [Sun Rui] Fix unit test failures. 193882f [Sun Rui] Fix Mima test error. fe25a33 [Sun Rui] Fix Mima test error. 35ecfa3 [Sun Rui] Fix comments. c38a005 [Sun Rui] Unzipped SparkR binary package is still required for standalone and Mesos modes. b05340c [Sun Rui] Fix scala style. 2ca5048 [Sun Rui] Fix comments. 1acefd1 [Sun Rui] Fix scala style. `0aa1e97` [Sun Rui] Fix scala style. 41d4f17 [Sun Rui] Add support for locating SparkR package for R workers required by RDD APIs. 49ff948 [Sun Rui] Invoke jar.exe with full path in install-dev.bat. 7b916c5 [Sun Rui] Use 'rem' consistently. 3bed438 [Sun Rui] Add a comment. 681afb0 [Sun Rui] Fix a bug that RRunner does not handle client deployment modes. cedfbe2 [Sun Rui] [SPARK-6797][SPARKR] Add support for YARN cluster mode.	2015-07-13 08:21:47 -07:00
Kay Ousterhout	30090884f9	[SPARK-8880] Fix confusing Stage.attemptId member variable Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #7275 from kayousterhout/SPARK-8880 and squashes the following commits: 3e9ce7c [Kay Ousterhout] Added missing return type e150278 [Kay Ousterhout] [SPARK-8880] Fix confusing Stage.attemptId member variable	2015-07-12 20:45:24 -04:00
Josh Rosen	fb8807c9b0	[SPARK-7078] [SPARK-7079] Binary processing sort for Spark SQL This patch adds a cache-friendly external sorter which operates on serialized bytes and uses this sorter to implement a new sort operator for Spark SQL and DataFrames. ### Overview of the new sorter The new sorter design is inspired by [Alphasort](http://research.microsoft.com/pubs/68249/alphasort.doc) and implements a key-prefix optimization in order to improve the cache friendliness of the sort. In naive sort implementations, the sorting algorithm operates on an array of record pointers. To compare two records for ordering, the sorter must dereference these pointers, which likely involves random memory access, then compare the objects themselves. ![image](https://cloud.githubusercontent.com/assets/50748/8611390/3b1402ae-2675-11e5-8308-1a10bf347e6e.png) In a key-prefix sort, the sort operates on an array which stores the record pointer alongside a prefix of the record's key. When comparing two records for ordering, the sorter first compares the the stored key prefixes. If the ordering can be determined from the key prefixes (i.e. the prefixes are unequal), then the sort can avoid directly comparing the records, avoiding random memory accesses and full record comparisons. For example, if we're sorting a list of strings then we can store the first 8 bytes of the UTF-8 encoded string as the key-prefix and can perform unsigned byte-at-a-time comparisons to determine the ordering of strings based on their prefixes, only resorting to full comparisons for strings that share a common prefix. In cases where the sort key can fit entirely in the space allotted for the key prefix (e.g. the sorting key is an integer), we completely avoid direct record comparison. In this patch's implementation of key-prefix sorting, our sorter's internal array stores a 64-bit long and 64-bit pointer for each record being sorted. The key prefixes are generated by the user when inserting records into the sorter, which uses a user-defined comparison function for comparing them. The `PrefixComparators` object implements a set of comparators for many common types, including primitive numeric types and UTF-8 strings. The actual sorting is implemented by `UnsafeInMemorySorter`. Most consumers will not use this directly, but instead will use `UnsafeExternalSorter`, a class which implements a sort that can spill to disk in response to memory pressure. Internally, `UnsafeExternalSorter` creates `UnsafeInMemorySorters` to perform sorting and uses `UnsafeSortSpillReader/Writer` to spill and read back runs of sorted records and `UnsafeSortSpillMerger` to merge multiple sorted spills into a single sorted iterator. This external sorter integrates with Spark's existing ShuffleMemoryManager for controlling spilling. Many parts of this sorter's design are based on / copied from the more specialized external sort implementation that I designed for the new UnsafeShuffleManager write path; see #5868 for more details on that patch. ### Sorting rows in Spark SQL For now, `UnsafeExternalSorter` is only used by Spark SQL, which uses it to implement a new sort operator, `UnsafeExternalSort`. This sort operator uses a SQL-specific class called `UnsafeExternalRowSorter` that configures an `UnsafeExternalSorter` to use prefix generators and comparators that operate on rows encoded in the UnsafeRow format that was designed for Project Tungsten. I used some interesting unit-testing techniques to test this patch's SQL-specific components. `UnsafeExternalSortSuite` uses the SQL random data generators introduced in #7176 to test the UnsafeSort operator with all atomic types both with and without nullability and in both ascending and descending sort orders. `PrefixComparatorsSuite` contains a cool use of ScalaCheck + ScalaTest's `GeneratorDrivenPropertyChecks` in order to test UTF8String prefix comparison. ### Misc. additional improvements made in this patch This patch made several miscellaneous improvements to related code in Spark SQL: - The logic for selecting physical sort operator implementations, which was partially duplicated in both `Exchange` and `SparkStrategies, has now been consolidated into a `getSortOperator()` helper function in `SparkStrategies`. - The `SparkPlanTest` unit testing helper trait has been extended with new methods for comparing the output produced by two different physical plans. This makes it easy to write tests which assert that two physical operator implementations should produce the same output. I also added a method for disabling the implicit sorting of outputs prior to comparing them, a change which is necessary in order to be able to write proper SparkPlan tests for sort operators. ### Tasks deferred to followup patches While most of this patch's features are reasonably well-tested and complete, there are a number of tasks that are intentionally being deferred to followup patches: - Add tests which mock the ShuffleMemoryManager to check that memory pressure properly triggers spilling (there are examples of this type of test in #5868). - Add tests to ensure that spill files are properly cleaned up after errors. I'd like to do this in the context of a patch which introduces more general metrics for ensuring proper cleanup of tasks' temporary files; see https://issues.apache.org/jira/browse/SPARK-8966 for more details. - Metrics integration: there are some open questions regarding how to track / report spill metrics for non-shuffle operations, so I've deferred most of the IO / shuffle metrics integration for now. - Performance profiling. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6444) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #6444 from JoshRosen/sql-external-sort and squashes the following commits: 6beb467 [Josh Rosen] Remove a bunch of overloaded methods to avoid default args. issue 2bbac9c [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 35dad9f [Josh Rosen] Make sortAnswers = false the default in SparkPlanTest 5135200 [Josh Rosen] Fix spill reading for large rows; add test 2f48777 [Josh Rosen] Add test and fix bug for sorting empty arrays d1e28bc [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort cd05866 [Josh Rosen] Fix scalastyle 3947fc1 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort d13ac55 [Josh Rosen] Hacky approach to copying of UnsafeRows for sort followed by limit. 845bea3 [Josh Rosen] Remove unnecessary zeroing of row conversion buffer c56ec18 [Josh Rosen] Clean up final row copying code. d31f180 [Josh Rosen] Re-enable NullType sorting test now that SPARK-8868 is fixed 844f4ca [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 293f109 [Josh Rosen] Add missing license header. f99a612 [Josh Rosen] Fix bugs in string prefix comparison. 9d00afc [Josh Rosen] Clean up prefix comparators for integral types 88aff18 [Josh Rosen] NULL_PREFIX has to be negative infinity for floating point types 613e16f [Josh Rosen] Test with larger data. 1d7ffaa [Josh Rosen] Somewhat hacky fix for descending sorts 08701e7 [Josh Rosen] Fix prefix comparison of null primitives. b86e684 [Josh Rosen] Set global = true in UnsafeExternalSortSuite. 1c7bad8 [Josh Rosen] Make sorting of answers explicit in SparkPlanTest.checkAnswer(). b81a920 [Josh Rosen] Temporarily enable only the passing sort tests 5d6109d [Josh Rosen] Fix inconsistent handling / encoding of record lengths. 87b6ed9 [Josh Rosen] Fix critical issues in test which led to false negatives. 8d7fbe7 [Josh Rosen] Fixes to multiple spilling-related bugs. 82e21c1 [Josh Rosen] Force spilling in UnsafeExternalSortSuite. 88b72db [Josh Rosen] Test ascending and descending sort orders. f27be09 [Josh Rosen] Fix tests by binding attributes. 0a79d39 [Josh Rosen] Revert "Undo part of a SparkPlanTest change in #7162 that broke my test." 7c3c864 [Josh Rosen] Undo part of a SparkPlanTest change in #7162 that broke my test. 9969c14 [Josh Rosen] Merge remote-tracking branch 'origin/master' into sql-external-sort 5822e6f [Josh Rosen] Fix test compilation issue 939f824 [Josh Rosen] Remove code gen experiment. 0dfe919 [Josh Rosen] Implement prefix sort for strings (albeit inefficiently). 66a813e [Josh Rosen] Prefix comparators for float and double b310c88 [Josh Rosen] Integrate prefix comparators for Int and Long (others coming soon) 95058d9 [Josh Rosen] Add missing SortPrefixUtils file 4c37ba6 [Josh Rosen] Add tests for sorting on all primitive types. 6890863 [Josh Rosen] Fix memory leak on empty inputs. d246e29 [Josh Rosen] Fix consideration of column types when choosing sort implementation. 6b156fb [Josh Rosen] Some WIP work on prefix comparison. 7f875f9 [Josh Rosen] Commit failing test demonstrating bug in handling objects in spills 41b8881 [Josh Rosen] Get UnsafeInMemorySorterSuite to pass (WIP) 90c2b6a [Josh Rosen] Update test name 6d6a1e6 [Josh Rosen] Centralize logic for picking sort operator implementations 9869ec2 [Josh Rosen] Clean up Exchange code a bit 82bb0ec [Josh Rosen] Fix IntelliJ complaint due to negated if condition 1db845a [Josh Rosen] Many more changes to harmonize with shuffle sorter ebf9eea [Josh Rosen] Harmonization with shuffle's unsafe sorter 206bfa2 [Josh Rosen] Add some missing newlines at the ends of files 26c8931 [Josh Rosen] Back out some Hive changes that aren't needed anymore 62f0bb8 [Josh Rosen] Update to reflect SparkPlanTest changes 21d7d93 [Josh Rosen] Back out of BlockObjectWriter change 7eafecf [Josh Rosen] Port test to SparkPlanTest d468a88 [Josh Rosen] Update for InternalRow refactoring 269cf86 [Josh Rosen] Back out SMJ operator change; isolate changes to selection of sort op. 1b841ca [Josh Rosen] WIP towards copying b420a71 [Josh Rosen] Move most of the existing SMJ code into Java. dfdb93f [Josh Rosen] SparkFunSuite change 73cc761 [Josh Rosen] Fix whitespace 9cc98f5 [Josh Rosen] Move more code to Java; fix bugs in UnsafeRowConverter length type. c8792de [Josh Rosen] Remove some debug logging dda6752 [Josh Rosen] Commit some missing code from an old git stash. 58f36d0 [Josh Rosen] Merge in a sketch of a unit test for the new sorter (now failing). 2bd8c9a [Josh Rosen] Import my original tests and get them to pass. d5d3106 [Josh Rosen] WIP towards external sorter for Spark SQL.	2015-07-10 16:44:51 -07:00
Min Zhou	c185f3a45d	[SPARK-8675] Executors created by LocalBackend won't get the same classpath as other executor backends AFAIK, some spark application always use LocalBackend to do some local initiatives, spark sql is an example. Starting a LocalPoint won't add user classpath into executor. ```java override def start() { localEndpoint = SparkEnv.get.rpcEnv.setupEndpoint( "LocalBackendEndpoint", new LocalEndpoint(SparkEnv.get.rpcEnv, scheduler, this, totalCores)) } ``` Thus will cause local executor fail with these scenarios, loading hadoop built-in native libraries, loading other user defined native libraries, loading user jars, reading s3 config from a site.xml file, etc Author: Min Zhou <coderplay@gmail.com> Closes #7091 from coderplay/master and squashes the following commits: 365838f [Min Zhou] Fixed java.net.MalformedURLException, add default scheme, support relative path d215b7f [Min Zhou] Follows spark standard scala style, make the auto testing happy 84ad2cd [Min Zhou] Use system specific path separator instead of ',' 01f5d1a [Min Zhou] Merge branch 'master' of https://github.com/apache/spark e528be7 [Min Zhou] Merge branch 'master' of https://github.com/apache/spark 45bf62c [Min Zhou] SPARK-8675 Executors created by LocalBackend won't get the same classpath as other executor backends	2015-07-10 09:52:40 -07:00
Cheng Hao	db6d57f87a	[CORE] [MINOR] change the log level to info Too many logs even when set the log level to warning. Author: Cheng Hao <hao.cheng@intel.com> Closes #7340 from chenghao-intel/log and squashes the following commits: 59658cf [Cheng Hao] change the log level to info	2015-07-10 09:50:46 -07:00
Andrew Or	5dd45bde4a	[SPARK-8958] Dynamic allocation: change cached timeout to infinity pwendell and I discussed this a little more offline and concluded that it would be good to keep it more conservative. Losing cached blocks may be very expensive and we should only allow it if the user knows what he/she is doing. FYI harishreedharan sryza. Author: Andrew Or <andrew@databricks.com> Closes #7329 from andrewor14/da-cached-timeout and squashes the following commits: cef0b4e [Andrew Or] Change timeout to infinity	2015-07-10 09:48:17 -07:00
Jonathan Alter	e14b545d2d	[SPARK-7977] [BUILD] Disallowing println Author: Jonathan Alter <jonalter@users.noreply.github.com> Closes #7093 from jonalter/SPARK-7977 and squashes the following commits: ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite 7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite 10724b6 [Jonathan Alter] Changing some printlns to logs in tests eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0b1dcb4 [Jonathan Alter] More println cleanup aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0c16fa3 [Jonathan Alter] Replacing some printlns with logs 45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 5c8e283 [Jonathan Alter] Allowing println in audit-release examples 5b50da1 [Jonathan Alter] Allowing printlns in example files ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 83ab635 [Jonathan Alter] Fixing new printlns 54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns b837c3a [Jonathan Alter] Disallowing println	2015-07-10 11:34:01 +01:00
Iulian Dragos	c4830598b2	[SPARK-6287] [MESOS] Add dynamic allocation to the coarse-grained Mesos scheduler This is largely based on extracting the dynamic allocation parts from tnachen's #3861. Author: Iulian Dragos <jaguarul@gmail.com> Closes #4984 from dragos/issue/mesos-coarse-dynamicAllocation and squashes the following commits: 39df8cd [Iulian Dragos] Update tests to latest changes in core. 9d2c9fa [Iulian Dragos] Remove adjustment of executorLimitOption in doKillExecutors. 8b00f52 [Iulian Dragos] Latest round of reviews. 0cd00e0 [Iulian Dragos] Add persistent shuffle directory 15c45c1 [Iulian Dragos] Add dynamic allocation to the Spark coarse-grained scheduler.	2015-07-09 13:26:46 -07:00
Andrew Or	ebdf585380	[SPARK-2017] [UI] Stage page hangs with many tasks (This reopens a patch that was closed in the past: #6248) When you view the stage page while running the following: ``` sc.parallelize(1 to X, 10000).count() ``` The page never loads, the job is stalled, and you end up running into an OOM: ``` HTTP ERROR 500 Problem accessing /stages/stage/. Reason: Server Error Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) ``` This patch compresses Jetty responses in gzip. The correct long-term fix is to add pagination. Author: Andrew Or <andrew@databricks.com> Closes #7296 from andrewor14/gzip-jetty and squashes the following commits: a051c64 [Andrew Or] Use GZIP to compress Jetty responses	2015-07-09 13:25:11 -07:00
xutingjun	930fe95350	[SPARK-8953] SPARK_EXECUTOR_CORES is not read in SparkSubmit The configuration ```SPARK_EXECUTOR_CORES``` won't put into ```SparkConf```, so it has no effect to the dynamic executor allocation. Author: xutingjun <xutingjun@huawei.com> Closes #7322 from XuTingjun/SPARK_EXECUTOR_CORES and squashes the following commits: 2cafa89 [xutingjun] make SPARK_EXECUTOR_CORES has effect to dynamicAllocation	2015-07-09 13:21:14 -07:00
Ankur Chauhan	1165b17d24	[SPARK-6707] [CORE] [MESOS] Mesos Scheduler should allow the user to specify constraints based on slave attributes Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes). For example, If the user sets a property, let's say `spark.mesos.constraints` is set to `tachyon=true;us-east-1=false`, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors. Author: Ankur Chauhan <achauhan@brightcove.com> Closes #5563 from ankurcha/mesos_attribs and squashes the following commits: 902535b [Ankur Chauhan] Fix line length d83801c [Ankur Chauhan] Update code as per code review comments 8b73f2d [Ankur Chauhan] Fix imports c3523e7 [Ankur Chauhan] Added docs 1a24d0b [Ankur Chauhan] Expand scope of attributes matching to include all data types 482fd71 [Ankur Chauhan] Update access modifier to private[this] for offer constraints 5ccc32d [Ankur Chauhan] Fix nit pick whitespace 1bce782 [Ankur Chauhan] Fix nit pick whitespace c0cbc75 [Ankur Chauhan] Use offer id value for debug message 7fee0ea [Ankur Chauhan] Add debug statements fc7eb5b [Ankur Chauhan] Fix import codestyle 00be252 [Ankur Chauhan] Style changes as per code review comments 662535f [Ankur Chauhan] Incorporate code review comments + use SparkFunSuite fdc0937 [Ankur Chauhan] Decline offers that did not meet criteria 67b58a0 [Ankur Chauhan] Add documentation for spark.mesos.constraints 63f53f4 [Ankur Chauhan] Update codestyle - uniform style for config values `02031e4` [Ankur Chauhan] Fix scalastyle warnings in tests c09ed84 [Ankur Chauhan] Fixed the access modifier on offerConstraints val to private[mesos] 0c64df6 [Ankur Chauhan] Rename overhead fractions to memory_*, fix spacing 8cc1e8f [Ankur Chauhan] Make exception message more explicit about the source of the error addedba [Ankur Chauhan] Added test case for malformed constraint string ec9d9a6 [Ankur Chauhan] Add tests for parse constraint string 72fe88a [Ankur Chauhan] Fix up tests + remove redundant method override, combine utility class into new mesos scheduler util trait 92b47fd [Ankur Chauhan] Add attributes based constraints support to MesosScheduler	2015-07-06 16:04:57 -07:00
Wisely Chen	9ff203346c	[SPARK-8656] [WEBUI] Fix the webUI and JSON API number is not synced Spark standalone master web UI show "Alive Workers" total core, total used cores and "Alive workers" total memory, memory used. But the JSON API page "http://MASTERURL:8088/json" shows "ALL workers" core, memory number. This webUI data is not sync with the JSON API. The proper way is to sync the number with webUI and JSON API. Author: Wisely Chen <wiselychen@appier.com> Closes #7038 from thegiive/SPARK-8656 and squashes the following commits: 9e54bf0 [Wisely Chen] Change variable name to camel case 2c8ea89 [Wisely Chen] Change some styling and add local variable 431d2b0 [Wisely Chen] Worker List should contain DEAD node also 8b3b8e8 [Wisely Chen] [SPARK-8656] Fix the webUI and JSON API number is not synced	2015-07-06 16:04:01 -07:00

1 2 3 4 5 ...

4821 commits