ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Josh Rosen	efa80a531e	[SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works with KryoSerializer This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used. The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer. Author: Josh Rosen <joshrosen@databricks.com> Closes #3831 from JoshRosen/SPARK-4882 and squashes the following commits: 0466c7a [Josh Rosen] Register PythonBroadcast with Kryo. d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug. 069d8a7 [Josh Rosen] Add failing test for SPARK-4882	2014-12-30 09:29:52 -08:00
Zhang, Liye	9077e721cd	[SPARK-4920][UI] add version on master and worker page for standalone mode Author: Zhang, Liye <liye.zhang@intel.com> Closes #3769 from liyezhang556520/spark-4920_WebVersion and squashes the following commits: 3bb7e0d [Zhang, Liye] add version on master and worker page	2014-12-30 09:19:47 -08:00
Yash Datta	9bc0df6804	SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions takeOrdered should skip reduce step in case mapped RDDs have no partitions. This prevents the mentioned exception : 4. run query SELECT * FROM testTable WHERE market = 'market2' ORDER BY End_Time DESC LIMIT 100; Error trace java.lang.UnsupportedOperationException: empty collection at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:863) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.reduce(RDD.scala:863) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1136) Author: Yash Datta <Yash.Datta@guavus.com> Closes #3830 from saucam/fix_takeorder and squashes the following commits: 5974d10 [Yash Datta] SPARK-4968: takeOrdered to skip reduce step in case mappers return no partitions	2014-12-29 13:49:45 -08:00
Kousuke Saruta	8d72341ab7	[Minor] Fix a typo of type parameter in JavaUtils.scala In JavaUtils.scala, thare is a typo of type parameter. In addition, the type information is removed at the time of compile by erasure. This issue is really minor so I don't file in JIRA. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3789 from sarutak/fix-typo-in-javautils and squashes the following commits: e20193d [Kousuke Saruta] Fixed a typo of type parameter 82bc5d9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into fix-typo-in-javautils 99f6f63 [Kousuke Saruta] Fixed a typo of type parameter in JavaUtils.scala	2014-12-29 12:05:08 -08:00
YanTangZhai	815de54002	[SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes #3785 from YanTangZhai/SPARK-4946 and squashes the following commits: 9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master 718afeb [YanTangZhai] Merge pull request #12 from apache/master 6e643f8 [YanTangZhai] Merge pull request #11 from apache/master e249846 [YanTangZhai] Merge pull request #10 from apache/master d26d982 [YanTangZhai] Merge pull request #9 from apache/master 76d4027 [YanTangZhai] Merge pull request #8 from apache/master 03b62b0 [YanTangZhai] Merge pull request #7 from apache/master 8a00106 [YanTangZhai] Merge pull request #6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master	2014-12-29 11:30:54 -08:00
GuoQiang Li	080ceb771a	[SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails Author: GuoQiang Li <witgo@qq.com> Closes #3788 from witgo/SPARK-4952 and squashes the following commits: d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails	2014-12-26 23:31:29 -08:00
Zhang, Liye	786808abfd	[SPARK-4954][Core] add spark version infomation in log for standalone mode The master and worker spark version may be not the same with Driver spark version. That is because spark Jar file might be replaced for new application without restarting the spark cluster. So there shall log out the spark-version in both Mater and Worker log. Author: Zhang, Liye <liye.zhang@intel.com> Closes #3790 from liyezhang556520/version4Standalone and squashes the following commits: e05e1e3 [Zhang, Liye] add spark version infomation in log for standalone mode	2014-12-26 23:24:22 -08:00
Sean Owen	29fabb1b52	SPARK-4297 [BUILD] Build warning fixes omnibus There are a number of warnings generated in a normal, successful build right now. They're mostly Java unchecked cast warnings, which can be suppressed. But there's a grab bag of other Scala language warnings and so on that can all be easily fixed. The forthcoming PR fixes about 90% of the build warnings I see now. Author: Sean Owen <sowen@cloudera.com> Closes #3157 from srowen/SPARK-4297 and squashes the following commits: 8c9e469 [Sean Owen] Suppress unchecked cast warnings, and several other build warning fixes	2014-12-24 13:32:51 -08:00
Kousuke Saruta	199e59aacd	[SPARK-4881][Minor] Use SparkConf#getBoolean instead of get().toBoolean It's really a minor issue. In ApplicationMaster, there is code like as follows. val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean I think, the code can be simplified like as follows. val preserveFiles = sparkConf.getBoolean("spark.yarn.preserve.staging.files", false) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3733 from sarutak/SPARK-4881 and squashes the following commits: 1771430 [Kousuke Saruta] Modified the code like sparkConf.get(...).toBoolean to sparkConf.getBoolean(...) c63daa0 [Kousuke Saruta] Simplified code	2014-12-23 19:14:34 -08:00
Marcelo Vanzin	7e2deb71c4	[SPARK-4606] Send EOF to child JVM when there's no more data to read. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3460 from vanzin/SPARK-4606 and squashes the following commits: 031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read.	2014-12-23 16:07:59 -08:00
Liang-Chi Hsieh	96281cd0c3	[SPARK-4913] Fix incorrect event log path SPARK-2261 uses a single file to log events for an app. `eventLogDir` in `ApplicationDescription` is replaced with `eventLogFile`. However, `ApplicationDescription` in `SparkDeploySchedulerBackend` is initialized with `SparkContext`'s `eventLogDir`. It is just the log directory, not the actual log file path. `Master.rebuildSparkUI` can not correctly rebuild a new SparkUI for the app. Because the `ApplicationDescription` is remotely registered with `Master` and the app's id is then generated in `Master`, we can not get the app id in advance before registration. So the received description needs to be modified with correct `eventLogFile` value. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3755 from viirya/fix_app_logdir and squashes the following commits: 5e0ea35 [Liang-Chi Hsieh] Revision for comment. b5730a1 [Liang-Chi Hsieh] Fix incorrect event log path. Closes #3777 (a duplicate PR for the same JIRA)	2014-12-23 14:58:44 -08:00
Andrew Or	27c5399f4d	[SPARK-4730][YARN] Warn against deprecated YARN settings See https://issues.apache.org/jira/browse/SPARK-4730. Author: Andrew Or <andrew@databricks.com> Closes #3590 from andrewor14/yarn-settings and squashes the following commits: 36e0753 [Andrew Or] Merge branch 'master' of github.com:apache/spark into yarn-settings dcd1316 [Andrew Or] Warn against deprecated YARN settings	2014-12-23 14:28:36 -08:00
Marcelo Vanzin	dd155369a0	[SPARK-4834] [standalone] Clean up application files after app finishes. Commit `7aacb7bfa` added support for sharing downloaded files among multiple executors of the same app. That works great in Yarn, since the app's directory is cleaned up after the app is done. But Spark standalone mode didn't do that, so the lock/cache files created by that change were left around and could eventually fill up the disk hosting /tmp. To solve that, create app-specific directories under the local dirs when launching executors. Multiple executors launched by the same Worker will use the same app directories, so they should be able to share the downloaded files. When the application finishes, a new message is sent to all workers telling them the application has finished; once that message has been received, and all executors registered for the application shut down, then those directories will be cleaned up by the Worker. Note: Unit testing this is hard (if even possible), since local-cluster mode doesn't seem to leave the Master/Worker daemons running long enough after `sc.stop()` is called for the clean up protocol to take effect. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #3705 from vanzin/SPARK-4834 and squashes the following commits: b430534 [Marcelo Vanzin] Remove seemingly unnecessary synchronization. 50eb4b9 [Marcelo Vanzin] Review feedback. c0e5ea5 [Marcelo Vanzin] [SPARK-4834] [standalone] Clean up application files after app finishes.	2014-12-23 12:02:08 -08:00
zsxwing	c233ab3d8d	[SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as, ```Scala val iterable = Seq(1, 2, 3).map(v => { println(v) v }) println("Iterable map done") val iterator = Seq(1, 2, 3).iterator.map(v => { println(v) v }) println("Iterator map done") ``` outputed ``` 1 2 3 Iterable map done Iterator map done ``` So we should use 'iterator' to reduce memory consumed by join. Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E Author: zsxwing <zsxwing@gmail.com> Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits: 48ee7b9 [zsxwing] Remove the explicit types 95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join	2014-12-22 14:26:28 -08:00
genmao.ygm	de9d7d2b5b	[SPARK-4920][UI]:current spark version in UI is not striking. It is not convenient to see the Spark version. We can keep the same style with Spark website. ![spark_version](https://cloud.githubusercontent.com/assets/7402327/5527025/1c8c721c-8a35-11e4-8d6a-2734f3c6bdf8.jpg) Author: genmao.ygm <genmao.ygm@alibaba-inc.com> Closes #3763 from uncleGen/master-clean-141222 and squashes the following commits: 0dcb9a9 [genmao.ygm] [SPARK-4920][UI]:current spark version in UI is not striking.	2014-12-22 14:14:39 -08:00
Kostas Sakellis	7c0ed13d29	[SPARK-4079] [CORE] Consolidates Errors if a CompressionCodec is not available This commit consolidates some of the exceptions thrown if compression codecs are not available. If a bad configuration string was passed in, a ClassNotFoundException was through. Also, if Snappy was not available, it would throw an InvocationTargetException when the codec was being used (not when it was being initialized). Now, an IllegalArgumentException is thrown when a codec is not available at creation time - either because the class does not exist or the codec itself is not available in the system. This will allow us to have a better message and fail faster. Author: Kostas Sakellis <kostas@cloudera.com> Closes #3119 from ksakellis/kostas-spark-4079 and squashes the following commits: 9709c7c [Kostas Sakellis] Removed unnecessary Logging class 63bfdd0 [Kostas Sakellis] Removed isAvailable to preserve binary compatibility 1d0ef2f [Kostas Sakellis] [SPARK-4079] [CORE] Added more information to exception 64f3d27 [Kostas Sakellis] [SPARK-4079] [CORE] Code review feedback 52dfa8f [Kostas Sakellis] [SPARK-4079] [CORE] Default to LZF if Snappy not available	2014-12-22 13:07:01 -08:00
Takeshi Yamamuro	fb8e85e80e	[SPARK-4733] Add missing prameter comments in ShuffleDependency Add missing Javadoc comments in ShuffleDependency. Author: Takeshi Yamamuro <linguin.m.s@gmail.com> Closes #3594 from maropu/DependencyJavadocFix and squashes the following commits: 32129b4 [Takeshi Yamamuro] Fix comments in @aggregator and @mapSideCombine 303c75d [Takeshi Yamamuro] [SPARK-4733] Add missing prameter comments in ShuffleDependency	2014-12-22 12:19:23 -08:00
Zhang, Liye	39272c8cdb	[SPARK-4870] Add spark version to driver log Author: Zhang, Liye <liye.zhang@intel.com> Closes #3717 from liyezhang556520/version2Log and squashes the following commits: ccd30d7 [Zhang, Liye] delete log in sparkConf 330f70c [Zhang, Liye] move the log from SaprkConf to SparkContext 96dc115 [Zhang, Liye] remove curly brace e833330 [Zhang, Liye] add spark version to driver log	2014-12-22 11:38:28 -08:00
zsxwing	93b2f3a882	[SPARK-4918][Core] Reuse Text in saveAsTextFile Reuse Text in saveAsTextFile to reduce GC. /cc rxin Author: zsxwing <zsxwing@gmail.com> Closes #3762 from zsxwing/SPARK-4918 and squashes the following commits: 59f03eb [zsxwing] Reuse Text in saveAsTextFile	2014-12-22 11:20:00 -08:00
zsxwing	6ee6aa70b7	[SPARK-2075][Core] Make the compiler generate same bytes code for Hadoop 1.+ and Hadoop 2.+ `NullWritable` is a `Comparable` rather than `Comparable[NullWritable]` in Hadoop 1.+, so the compiler cannot find an implicit Ordering for it. It will generate different anonymous classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+. Therefore, here we provide an Ordering for NullWritable so that the compiler will generate same codes. I used the following commands to confirm the generated byte codes are some. ``` mvn -Dhadoop.version=1.2.1 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop1.txt mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package -pl core -am javap -private -c -classpath core/target/scala-2.10/classes org.apache.spark.rdd.RDD > ~/hadoop2.txt diff ~/hadoop1.txt ~/hadoop2.txt ``` However, the compiler will generate different codes for the classes which call methods of `JobContext/TaskAttemptContext`. `JobContext/TaskAttemptContext` is a class in Hadoop 1.+, and calling its method will use `invokevirtual`, while it's an interface in Hadoop 2.+, and will use `invokeinterface`. To fix it, we can use reflection to call `JobContext/TaskAttemptContext.getConfiguration`. Author: zsxwing <zsxwing@gmail.com> Closes #3740 from zsxwing/SPARK-2075 and squashes the following commits: 39d9df2 [zsxwing] Fix the code style e4ad8b5 [zsxwing] Use null for the implicit Ordering 734bac9 [zsxwing] Explicitly set the implicit parameters ca03559 [zsxwing] Use reflection to access JobContext/TaskAttemptContext.getConfiguration fa40db0 [zsxwing] Add an Ordering for NullWritable to make the compiler generate same byte codes for RDD	2014-12-21 22:10:19 -08:00
huangzhaowei	a764960b3b	[Minor] Build Failed: value defaultProperties not found Mvn Build Failed: value defaultProperties not found .Maybe related to this pr: `1d648123a7` andrewor14 can you look at this problem? Author: huangzhaowei <carlmartinmax@gmail.com> Closes #3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits: 8e2917c [huangzhaowei] Build Failed: value defaultProperties not found	2014-12-19 23:32:56 -08:00
Kanwaljit Singh	1d648123a7	SPARK-2641: Passing num executors to spark arguments from properties file Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances. Author: Kanwaljit Singh <kanwaljit.singh@guavus.com> Closes #1657 from kjsingh/branch-1.0 and squashes the following commits: d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala	2014-12-19 19:27:23 -08:00
Marcelo Vanzin	456451911d	[SPARK-2261] Make event logger use a single file. Currently the event logger uses a directory and several files to describe an app's event log, all but one of which are empty. This is not very HDFS-friendly, since creating lots of nodes in HDFS (especially when they don't contain any data) is frowned upon due to the node metadata being kept in the NameNode's memory. Instead, add a header section to the event log file that contains metadata needed to read the events. This metadata includes things like the Spark version (for future code that may need it for backwards compatibility) and the compression codec used for the event data. With the new approach, aside from reducing the load on the NN, there's also a lot less remote calls needed when reading the log directory. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #1222 from vanzin/hist-server-single-log and squashes the following commits: cc8f5de [Marcelo Vanzin] Store header in plain text. c7e6123 [Marcelo Vanzin] Update comment. 59c561c [Marcelo Vanzin] Review feedback. 216c5a3 [Marcelo Vanzin] Review comments. dce28e9 [Marcelo Vanzin] Fix log overwrite test. f91c13e [Marcelo Vanzin] Handle "spark.eventLog.overwrite", and add unit test. 346f0b4 [Marcelo Vanzin] Review feedback. ed0023e [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 3f4500f [Marcelo Vanzin] Unit test for SPARK-3697. 45c7a1f [Marcelo Vanzin] Version of SPARK-3697 for this branch. b3ee30b [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log a6d5c50 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 16fd491 [Marcelo Vanzin] Use unique log directory for each codec. 0ef3f70 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log d93c44a [Marcelo Vanzin] Add a newline to make the header more readable. 9e928ba [Marcelo Vanzin] Add types. bd6ba8c [Marcelo Vanzin] Review feedback. a624a89 [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log 04364dc [Marcelo Vanzin] Merge branch 'master' into hist-server-single-log bb7c2d3 [Marcelo Vanzin] Fix scalastyle warning. 16661a3 [Marcelo Vanzin] Simplify some internal code. cc6bce4 [Marcelo Vanzin] Some review feedback. a722184 [Marcelo Vanzin] Do not encode metadata in log file name. 3700586 [Marcelo Vanzin] Restore log flushing. f677930 [Marcelo Vanzin] Fix botched rebase. ae571fa [Marcelo Vanzin] Fix end-to-end event logger test. 9db0efd [Marcelo Vanzin] Show prettier name in UI. 8f42274 [Marcelo Vanzin] Make history server parse old-style log directories. 6251dd7 [Marcelo Vanzin] Make event logger use a single file.	2014-12-19 18:23:42 -08:00
Ryan Williams	7981f96976	[SPARK-4896] don’t redundantly overwrite executor JAR deps Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #2848 from ryan-williams/fetch-file and squashes the following commits: c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently 8e39c16 [Ryan Williams] code review feedback 788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps	2014-12-19 15:24:41 -08:00
Ryan Williams	cdb2c645ab	[SPARK-4889] update history server example cmds Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3736 from ryan-williams/hist and squashes the following commits: 421d8ff [Ryan Williams] add another random typo fix 76d6a4c [Ryan Williams] remove hdfs example a2d0f82 [Ryan Williams] code review feedback 9ca7629 [Ryan Williams] [SPARK-4889] update history server example cmds	2014-12-19 13:56:04 -08:00
Reynold Xin	336cd341ee	Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor. This consolidates some code path and makes constructor arguments simpler for a few classes. Author: Reynold Xin <rxin@databricks.com> Closes #3738 from rxin/sparkEnvDepRefactor and squashes the following commits: 82e02cc [Reynold Xin] Fixed couple bugs. 217062a [Reynold Xin] Code review feedback. bd00af7 [Reynold Xin] Small refactoring to pass SparkEnv into Executor rather than creating SparkEnv in Executor.	2014-12-19 12:51:12 -08:00
Sandy Ryza	283263ffaa	SPARK-3428. TaskMetrics for running tasks is missing GC time metrics Author: Sandy Ryza <sandy@cloudera.com> Closes #3684 from sryza/sandy-spark-3428 and squashes the following commits: cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics	2014-12-18 22:40:44 -08:00
Liang-Chi Hsieh	d7fc69a8b5	[SPARK-4674] Refactor getCallSite The current version of `getCallSite` visits the collection of `StackTraceElement` twice. However, it is unnecessary since we can perform our work with a single visit. We also do not need to keep filtered `StackTraceElement`. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #3532 from viirya/refactor_getCallSite and squashes the following commits: 62aa124 [Liang-Chi Hsieh] Fix style. e741017 [Liang-Chi Hsieh] Refactor getCallSite.	2014-12-18 21:41:02 -08:00
Andrew Or	9804a759b6	[SPARK-4754] Refactor SparkContext into ExecutorAllocationClient This is such that the `ExecutorAllocationManager` does not take in the `SparkContext` with all of its dependencies as an argument. This prevents future developers of this class to tie down this class further with the `SparkContext`, which has really become quite a monstrous object. cc'ing pwendell who originally suggested this, and JoshRosen who may have thoughts about the trait mix-in style of `SparkContext`. Author: Andrew Or <andrew@databricks.com> Closes #3614 from andrewor14/dynamic-allocation-sc and squashes the following commits: 187070d [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 59baf6c [Andrew Or] Merge branch 'master' of github.com:apache/spark into dynamic-allocation-sc 347a348 [Andrew Or] Refactor SparkContext into ExecutorAllocationClient	2014-12-18 17:38:33 -08:00
Aaron Davidson	105293a7d0	[SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config This is used in NioBlockTransferService here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/network/nio/NioBlockTransferService.scala#L66 Author: Aaron Davidson <aaron@databricks.com> Closes #3688 from aarondav/SPARK-4837 and squashes the following commits: ebd2007 [Aaron Davidson] [SPARK-4837] NettyBlockTransferService should use spark.blockManager.port config	2014-12-18 16:43:16 -08:00
Ivan Vergiliev	f9f58b9a01	SPARK-4743 - Use SparkEnv.serializer instead of closureSerializer in aggregateByKey and foldByKey Author: Ivan Vergiliev <ivan@leanplum.com> Closes #3605 from IvanVergiliev/change-serializer and squashes the following commits: a49b7cf [Ivan Vergiliev] Use serializer instead of closureSerializer in aggregate/foldByKey.	2014-12-18 16:29:36 -08:00
Madhu Siddalingaiah	d5a596d418	[SPARK-4884]: Improve Partition docs Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884 Author: Madhu Siddalingaiah <madhu@madhu.com> Closes #3722 from msiddalingaiah/master and squashes the following commits: 79e679f [Madhu Siddalingaiah] [DOC]: improve documentation 51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again) 332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code> cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master' 0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions	2014-12-18 16:00:53 -08:00
Ilya Ganelin	3720057b8e	[SPARK-3607] ConnectionManager threads.max configs on the thread pools don't work Hi all - cleaned up the code to get rid of the unused parameter and added some discussion of the ThreadPoolExecutor parameters to explain why we can use a single threadCount instead of providing a min/max. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3664 from ilganeli/SPARK-3607C and squashes the following commits: 3c05690 [Ilya Ganelin] Updated documentation and refactored code to extract shared variables	2014-12-18 12:53:18 -08:00
Saisai Shao	cf50631a66	[SPARK-4595][Core] Fix MetricsServlet not work issue `MetricsServlet` handler should be added to the web UI after initialized by `MetricsSystem`, otherwise servlet handler cannot be attached. Author: Saisai Shao <saisai.shao@intel.com> Author: Josh Rosen <joshrosen@databricks.com> Author: jerryshao <saisai.shao@intel.com> Closes #3444 from jerryshao/SPARK-4595 and squashes the following commits: 434d17e [Saisai Shao] Merge pull request #10 from JoshRosen/metrics-system-cleanup 87a2292 [Josh Rosen] Guard against misuse of MetricsSystem methods. f779fe0 [jerryshao] Fix MetricsServlet not work issue	2014-12-17 11:47:44 -08:00
Davies Liu	ed362008f0	[SPARK-4437] update doc for WholeCombineFileRecordReader update doc for WholeCombineFileRecordReader Author: Davies Liu <davies@databricks.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #3301 from davies/fix_doc and squashes the following commits: 1d7422f [Davies Liu] Merge pull request #2 from JoshRosen/whole-text-file-cleanup dc3d21a [Josh Rosen] More genericization in ConfigurableCombineFileRecordReader. 95d13eb [Davies Liu] address comment bf800b9 [Davies Liu] update doc for WholeCombineFileRecordReader	2014-12-16 11:19:36 -08:00
meiyoula	c7628771da	[SPARK-4792] Add error message when making local dir unsuccessfully Author: meiyoula <1039320815@qq.com> Closes #3635 from XuTingjun/master and squashes the following commits: dd1c66d [meiyoula] when old is deleted, it will throw an exception where call it 2a55bc2 [meiyoula] Update DiskBlockManager.scala `1483a4a` [meiyoula] Delete multiple retries to make dir `67f7902` [meiyoula] Try some times to make dir maybe more reasonable 1c51a0c [meiyoula] Update DiskBlockManager.scala	2014-12-15 22:30:18 -08:00
wangfei	5c24759ddc	[Minor][Core] fix comments in MapOutputTracker Using driver and executor in the comments of ```MapOutputTracker``` is more clear. Author: wangfei <wangfei1@huawei.com> Closes #3700 from scwf/commentFix and squashes the following commits: aa68524 [wangfei] master and worker should be driver and executor	2014-12-15 16:46:43 -08:00
Sean Owen	2a28bc6100	SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it. Author: Sean Owen <sowen@cloudera.com> Closes #3690 from srowen/SPARK-785 and squashes the following commits: 8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey	2014-12-15 16:06:15 -08:00
Ryan Williams	8176b7a02e	[SPARK-4668] Fix some documentation typos. Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes #3523 from ryan-williams/tweaks and squashes the following commits: d2eddaa [Ryan Williams] code review feedback ce27fc1 [Ryan Williams] CoGroupedRDD comment nit c6cfad9 [Ryan Williams] remove unnecessary if statement b74ea35 [Ryan Williams] comment fix b0221f0 [Ryan Williams] fix a gendered pronoun c71ffed [Ryan Williams] use names on a few boolean parameters 89954aa [Ryan Williams] clarify some comments in {Security,Shuffle}Manager e465dac [Ryan Williams] Saved building-spark.md with Dillinger.io 83e8358 [Ryan Williams] fix pom.xml typo dc4662b [Ryan Williams] typo fixes in tuning.md, configuration.md	2014-12-15 14:52:17 -08:00
Ilya Ganelin	38703bbca8	[SPARK-1037] The name of findTaskFromList & findTask in TaskSetManager.scala is confusing Hi all - I've renamed the methods referenced in this JIRA to clarify that they modify the provided arrays (find vs. deque). Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3665 from ilganeli/SPARK-1037B and squashes the following commits: 64c177c [Ilya Ganelin] Renamed deque to dequeue f27d85e [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters 683482a [Ilya Ganelin] Renamed private methods to clarify that they modify the provided parameters	2014-12-15 14:51:15 -08:00
Zhang, Liye	57d37f9c71	[CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala with other places Author: Zhang, Liye <liye.zhang@intel.com> Closes #2793 from liyezhang556520/uniformHashMap and squashes the following commits: 5884735 [Zhang, Liye] [CORE]codeStyle: uniform ConcurrentHashMap define in StorageLevel.scala	2014-12-10 20:44:59 -08:00
Andrew Or	4f93d0cabe	[SPARK-4759] Fix driver hanging from coalescing partitions The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction. This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly. Author: Andrew Or <andrew@databricks.com> Closes #3633 from andrewor14/coalesce-preferred-loc and squashes the following commits: e520d6b [Andrew Or] Oops 3ebf8bd [Andrew Or] A few comments f370a4e [Andrew Or] Fix tests 2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location	2014-12-10 14:27:53 -08:00
Ilya Ganelin	447ae2de5d	[SPARK-4569] Rename 'externalSorting' in Aggregator Hi all - I've renamed the unhelpfully named variable and added a comment clarifying what's actually happening. Author: Ilya Ganelin <ilya.ganelin@capitalone.com> Closes #3666 from ilganeli/SPARK-4569B and squashes the following commits: 1810394 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator e2d2092 [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator d7cefec [Ilya Ganelin] [SPARK-4569] Rename 'externalSorting' in Aggregator 5b3f39c [Ilya Ganelin] [SPARK-4569] Rename in Aggregator	2014-12-10 14:19:37 -08:00
Andrew Or	faa8fd8178	[SPARK-4215] Allow requesting / killing executors only in YARN mode Currently this doesn't do anything in other modes, so we might as well just disable it rather than having the user mistakenly rely on it. Author: Andrew Or <andrew@databricks.com> Closes #3615 from andrewor14/dynamic-allocation-yarn-only and squashes the following commits: ce6487a [Andrew Or] Allow requesting / killing executors only in YARN mode	2014-12-10 12:48:24 -08:00
Kousuke Saruta	0fc637b4c2	[SPARK-4329][WebUI] HistoryPage pagenation Current HistoryPage have links only to previous page or next page. I suggest to add index to access history pages easily. I implemented like following pics. If there are many pages, current page +/- N pages, head page and last page are indexed. ![2014-11-10 16 13 25](https://cloud.githubusercontent.com/assets/4736016/4986246/9c7bbac4-6937-11e4-8695-8634d039d5b6.png) ![2014-11-10 16 03 21](https://cloud.githubusercontent.com/assets/4736016/4986210/3951bb74-6937-11e4-8b4e-9f90d266d736.png) ![2014-11-10 16 03 39](https://cloud.githubusercontent.com/assets/4736016/4986211/3b196ad8-6937-11e4-9f81-74bc0a6dad5b.png) ![2014-11-10 16 03 49](https://cloud.githubusercontent.com/assets/4736016/4986213/40686138-6937-11e4-86c0-41100f0404f6.png) ![2014-11-10 16 04 04](https://cloud.githubusercontent.com/assets/4736016/4986215/4326c9b4-6937-11e4-87ac-0f30c86ec6e3.png) Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #3194 from sarutak/history-page-indexing and squashes the following commits: 15d3d2d [Kousuke Saruta] Simplified code c93932e [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 1c2f605 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing 76b05e3 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing b2240f8 [Kousuke Saruta] Fixed style ec7922e [Kousuke Saruta] Simplified code 755a004 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into history-page-indexing cfa242b [Kousuke Saruta] Added index to HistoryPage	2014-12-10 12:30:45 -08:00
Nathan Kronenfeld	94b377f944	[SPARK-4772] Clear local copies of accumulators as soon as we're done with them Accumulators keep thread-local copies of themselves. These copies were only cleared at the beginning of a task. This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker. This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks. It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up. Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com> Closes #3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits: a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark. 537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task. 39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them	2014-12-09 23:53:17 -08:00
Josh Rosen	f79c1cfc99	[Minor] Use <sup> tag for help icon in web UI page header This small commit makes the `(?)` web UI help link into a superscript, which should address feedback that the current design makes it look like an error occurred or like information is missing. Before: ![image](https://cloud.githubusercontent.com/assets/50748/5370611/a3ed0034-7fd9-11e4-870f-05bd9faad5b9.png) After: ![image](https://cloud.githubusercontent.com/assets/50748/5370602/6c5ca8d6-7fd9-11e4-8d1a-568d71290aa7.png) Author: Josh Rosen <joshrosen@databricks.com> Closes #3659 from JoshRosen/webui-help-sup and squashes the following commits: bd72899 [Josh Rosen] Use <sup> tag for help icon in web UI page header.	2014-12-09 23:47:05 -08:00
Sandy Ryza	5e4c06f8e5	SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable Author: Sandy Ryza <sandy@cloudera.com> Closes #3426 from sryza/sandy-spark-4567 and squashes the following commits: cb4b8d2 [Sandy Ryza] SPARK-4567. Make SparkJobInfo and SparkStageInfo serializable	2014-12-09 16:26:07 -08:00
hushan[胡珊]	30dca924df	[SPARK-4714] BlockManager.dropFromMemory() should check whether block has been removed after synchronizing on BlockInfo instance. After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed. The three methods have the same logic to get the `info` lock: ``` info = blockInfo.get(id) if (info != null) { info.synchronized { // do something } } ``` So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first. The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed. But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks. This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread. Author: hushan[胡珊] <hushan@xiaomi.com> Closes #3574 from suyanNone/refine-block-concurrency and squashes the following commits: edb989d [hushan[胡珊]] Refine code style and comments position 55fa4ba [hushan[胡珊]] refine code e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn	2014-12-09 15:54:40 -08:00
Kay Ousterhout	1f5110630c	[SPARK-4765] Make GC time always shown in UI. This commit removes the GC time for each task from the set of optional, additional metrics, and instead always shows it for each task. cc pwendell Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #3622 from kayousterhout/gc_time and squashes the following commits: 15ac242 [Kay Ousterhout] Make TaskDetailsClassNames private[spark] e71d893 [Kay Ousterhout] [SPARK-4765] Make GC time always shown in UI.	2014-12-09 15:10:36 -08:00

1 2 3 4 5 ...

3690 commits