ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Sean Owen	50ada2a4d3	[SPARK-22033][CORE] BufferHolder, other size checks should account for the specific VM array size limitations ## What changes were proposed in this pull request? Try to avoid allocating an array bigger than Integer.MAX_VALUE - 8, which is the actual max size on some JVMs, in several places ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #19266 from srowen/SPARK-22033.	2017-09-23 15:40:59 +01:00
Bryan Cutler	27fc536d9a	[SPARK-21190][PYSPARK] Python Vectorized UDFs This PR adds vectorized UDFs to the Python API Proposed API Introduce a flag to turn on vectorization for a defined UDF, for example: ``` pandas_udf(DoubleType()) def plus(a, b) return a + b ``` or ``` plus = pandas_udf(lambda a, b: a + b, DoubleType()) ``` Usage is the same as normal UDFs 0-parameter UDFs pandas_udf functions can declare an optional `kwargs` and when evaluated, will contain a key "size" that will give the required length of the output. For example: ``` pandas_udf(LongType()) def f0(kwargs): return pd.Series(1).repeat(kwargs["size"]) df.select(f0()) ``` Added new unit tests in pyspark.sql that are enabled if pyarrow and Pandas are available. - [x] Fix support for promoted types with null values - [ ] Discuss 0-param UDF API (use of kwargs) - [x] Add tests for chained UDFs - [ ] Discuss behavior when pyarrow not installed / enabled - [ ] Cleanup pydoc and add user docs Author: Bryan Cutler <cutlerb@gmail.com> Author: Takuya UESHIN <ueshin@databricks.com> Closes #18659 from BryanCutler/arrow-vectorized-udfs-SPARK-21404.	2017-09-22 16:17:50 +08:00
Imran Rashid	b75bd17774	[SPARK-21928][CORE] Set classloader on SerializerManager's private kryo ## What changes were proposed in this pull request? We have to make sure that SerializerManager's private instance of kryo also uses the right classloader, regardless of the current thread classloader. In particular, this fixes serde during remote cache fetches, as those occur in netty threads. ## How was this patch tested? Manual tests & existing suite via jenkins. I haven't been able to reproduce this is in a unit test, because when a remote RDD partition can be fetched, there is a warning message and then the partition is just recomputed locally. I manually verified the warning message is no longer present. Author: Imran Rashid <irashid@cloudera.com> Closes #19280 from squito/SPARK-21928_ser_classloader.	2017-09-21 10:20:19 -07:00
jerryshao	1da5822e6a	[SPARK-21934][CORE] Expose Shuffle Netty memory usage to MetricsSystem ## What changes were proposed in this pull request? This is a followup work of SPARK-9104 to expose the Netty memory usage to MetricsSystem. Current the shuffle Netty memory usage of `NettyBlockTransferService` will be exposed, if using external shuffle, then the Netty memory usage of `ExternalShuffleClient` and `ExternalShuffleService` will be exposed instead. Currently I don't expose Netty memory usage of `YarnShuffleService`, because `YarnShuffleService` doesn't have `MetricsSystem` itself, and is better to connect to Hadoop's MetricsSystem. ## How was this patch tested? Manually verified in local cluster. Author: jerryshao <sshao@hortonworks.com> Closes #19160 from jerryshao/SPARK-21934.	2017-09-21 13:54:30 +08:00
Sean Owen	3d4dd14cd5	[SPARK-22066][BUILD] Update checkstyle to 8.2, enable it, fix violations ## What changes were proposed in this pull request? Update plugins, including scala-maven-plugin, to latest versions. Update checkstyle to 8.2. Remove bogus checkstyle config and enable it. Fix existing and new Java checkstyle errors. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #19282 from srowen/SPARK-22066.	2017-09-20 10:01:46 +01:00
Marcelo Vanzin	c6ff59a230	[SPARK-18838][CORE] Add separate listener queues to LiveListenerBus. This change modifies the live listener bus so that all listeners are added to queues; each queue has its own thread to dispatch events, making it possible to separate slow listeners from other more performance-sensitive ones. The public API has not changed - all listeners added with the existing "addListener" method, which after this change mostly means all user-defined listeners, end up in a default queue. Internally, there's an API allowing listeners to be added to specific queues, and that API is used to separate the internal Spark listeners into 3 categories: application status listeners (e.g. UI), executor management (e.g. dynamic allocation), and the event log. The queueing logic, while abstracted away in a separate class, is kept as much as possible hidden away from consumers. Aside from choosing their queue, there's no code change needed to take advantage of queues. Test coverage relies on existing tests; a few tests had to be tweaked because they relied on `LiveListenerBus.postToAll` being synchronous, and the change makes that method asynchronous. Other tests were simplified not to use the asynchronous LiveListenerBus. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #19211 from vanzin/SPARK-18838.	2017-09-20 13:41:29 +08:00
jerryshao	8319432af6	[SPARK-21917][CORE][YARN] Supporting adding http(s) resources in yarn mode ## What changes were proposed in this pull request? In the current Spark, when submitting application on YARN with remote resources `./bin/spark-shell --jars http://central.maven.org/maven2/com/github/swagger-akka-http/swagger-akka-http_2.11/0.10.1/swagger-akka-http_2.11-0.10.1.jar --master yarn-client -v`, Spark will be failed with: ``` java.io.IOException: No FileSystem for scheme: http at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2586) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2593) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2632) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:354) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:478) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:600) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11$$anonfun$apply$6.apply(Client.scala:599) at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:599) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$11.apply(Client.scala:598) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:598) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:848) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:173) ``` This is because `YARN#client` assumes resources are on the Hadoop compatible FS. To fix this problem, here propose to download remote http(s) resources to local and add this local downloaded resources to dist cache. This solution has one downside: remote resources are downloaded and uploaded again, but it only restricted to only remote http(s) resources, also the overhead is not so big. The advantages of this solution is that it is simple and the code changes restricts to only `SparkSubmit`. ## How was this patch tested? Unit test added, also verified in local cluster. Author: jerryshao <sshao@hortonworks.com> Closes #19130 from jerryshao/SPARK-21917.	2017-09-19 22:20:05 +08:00
Armin	7c92351f43	[MINOR][CORE] Cleanup dead code and duplication in Mem. Management ## What changes were proposed in this pull request? * Removed the method `org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter#alignToWords`. It became unused as a result of `85b0a15754` (SPARK-15962) introducing word alignment for unsafe arrays. * Cleaned up duplicate code in memory management and unsafe sorters * The change extracting the exception paths is more than just cosmetics since it def. reduces the size the affected methods compile to ## How was this patch tested? * Build still passes after removing the method, grepping the codebase for `alignToWords` shows no reference to it anywhere either. * Dried up code is covered by existing tests. Author: Armin <me@obrown.io> Closes #19254 from original-brownbear/cleanup-mem-consumer.	2017-09-19 10:06:32 +01:00
Xianyang Liu	a11db942aa	[SPARK-21923][CORE] Avoid calling reserveUnrollMemoryForThisTask for every record ## What changes were proposed in this pull request? When Spark persist data to Unsafe memory, we call the method `MemoryStore.putIteratorAsBytes`, which need synchronize the `memoryManager` for every record write. This implementation is not necessary, we can apply for more memory at a time to reduce unnecessary synchronization. ## How was this patch tested? Test case (with 1 executor 20 core): ```scala val start = System.currentTimeMillis() val data = sc.parallelize(0 until Integer.MAX_VALUE, 100) .persist(StorageLevel.OFF_HEAP) .count() println(System.currentTimeMillis() - start) ``` Test result: before \| 27647 \| 29108 \| 28591 \| 28264 \| 27232 \| after \| 26868 \| 26358 \| 27767 \| 26653 \| 26693 \| Author: Xianyang Liu <xianyang.liu@intel.com> Closes #19135 from ConeyLiu/memorystore.	2017-09-19 14:51:27 +08:00
alexmnyc	94f7e046a2	[SPARK-22030][CORE] GraphiteSink fails to re-connect to Graphite instances behind an ELB or any other auto-scaled LB ## What changes were proposed in this pull request? Upgrade codahale metrics library so that Graphite constructor can re-resolve hosts behind a CNAME with re-tried DNS lookups. When Graphite is deployed behind an ELB, ELB may change IP addresses based on auto-scaling needs. Using current approach yields Graphite usage impossible, fixing for that use case - Upgrade to codahale 3.1.5 - Use new Graphite(host, port) constructor instead of new Graphite(new InetSocketAddress(host, port)) constructor ## How was this patch tested? The same logic is used for another project that is using the same configuration and code path, and graphite re-connect's behind ELB's are no longer an issue This are proposed changes for codahale lib - https://github.com/dropwizard/metrics/compare/v3.1.2...v3.1.5#diff-6916c85d2dd08d89fe771c952e3b8512R120. Specifically, `b4d246d34e/metrics-graphite/src/main/java/com/codahale/metrics/graphite/Graphite.java (L120)` Please review http://spark.apache.org/contributing.html before opening a pull request. Author: alexmnyc <project@alexandermarkham.com> Closes #19210 from alexmnyc/patch-1.	2017-09-19 10:05:59 +08:00
Sital Kedia	1e978b17d6	[SPARK-21113][CORE] Read ahead input stream to amortize disk IO cost … Profiling some of our big jobs, we see that around 30% of the time is being spent in reading the spill files from disk. In order to amortize the disk IO cost, the idea is to implement a read ahead input stream which asynchronously reads ahead from the underlying input stream when specified amount of data has been read from the current buffer. It does it by maintaining two buffer - active buffer and read ahead buffer. The active buffer contains data which should be returned when a read() call is issued. The read-ahead buffer is used to asynchronously read from the underlying input stream and once the active buffer is exhausted, we flip the two buffers so that we can start reading from the read ahead buffer without being blocked in disk I/O. ## How was this patch tested? Tested by running a job on the cluster and could see up to 8% CPU improvement. Author: Sital Kedia <skedia@fb.com> Author: Shixiong Zhu <zsxwing@gmail.com> Author: Sital Kedia <sitalkedia@users.noreply.github.com> Closes #18317 from sitalkedia/read_ahead_buffer.	2017-09-17 23:15:08 -07:00
Andrew Ash	6308c65f08	[SPARK-21953] Show both memory and disk bytes spilled if either is present As written now, there must be both memory and disk bytes spilled to show either of them. If there is only one of those types of spill recorded, it will be hidden. Author: Andrew Ash <andrew@andrewash.com> Closes #19164 from ash211/patch-3.	2017-09-18 10:42:24 +08:00
zhoukang	22b111ef9d	[SPARK-21902][CORE] Print root cause for BlockManager#doPut ## What changes were proposed in this pull request? As logging below, actually exception will be hidden when removeBlockInternal throw an exception. `2017-08-31,10:26:57,733 WARN org.apache.spark.storage.BlockManager: Putting block broadcast_110 failed due to an exception 2017-08-31,10:26:57,734 WARN org.apache.spark.broadcast.BroadcastManager: Failed to create a new broadcast in 1 attempts java.io.IOException: Failed to create local dir in /tmp/blockmgr-5bb5ac1e-c494-434a-ab89-bd1808c6b9ed/2e. at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70) at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:115) at org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1339) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:726) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1233) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:122) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:88) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager$$anonfun$newBroadcast$1.apply$mcVI$sp(BroadcastManager.scala:60) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:58) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1415) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1002) at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:924) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:771) at org.apache.spark.scheduler.DAGScheduler$$anonfun$submitWaitingChildStages$6.apply(DAGScheduler.scala:770) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.scheduler.DAGScheduler.submitWaitingChildStages(DAGScheduler.scala:770) at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1235) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1662) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1620) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1609) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)` In this pr i will print exception first make troubleshooting more conveniently. PS: This one split from [PR-19133](https://github.com/apache/spark/pull/19133) ## How was this patch tested? Exsist unit test Author: zhoukang <zhoukang199191@gmail.com> Closes #19171 from caneGuy/zhoukang/print-rootcause.	2017-09-15 14:03:26 +08:00
zhoukang	4b88393cb9	[SPARK-21922] Fix duration always updating when task failed but status is still RUN… …NING ## What changes were proposed in this pull request? When driver quit abnormally which cause executor shutdown and task metrics can not be sent to driver for updating.In this case the status will always be 'RUNNING' and the duration on history UI will be 'CurrentTime - launchTime' which increase infinitely. We can fix this time by modify time of event log since this time has gotten when `FSHistoryProvider` fetch event log from File System. And the result picture is uploaded in [SPARK-21922](https://issues.apache.org/jira/browse/SPARK-21922). How to reproduce? (1) Submit a job to spark on yarn (2) Mock an oom(or other case can make driver quit abnormally) senario for driver (3) Make sure executor is running task when driver quitting (4) Open the history server and checkout result It is not a corner case since there are many such jobs in our current cluster. ## How was this patch tested? Deploy historyserver and open a job has this problem. Author: zhoukang <zhoukang199191@gmail.com> Closes #19132 from caneGuy/zhoukang/fix-duration.	2017-09-14 20:40:33 +08:00
Zheng RuiFeng	66cb72d7b9	[MINOR][DOC] Add missing call of `update()` in examples of PeriodicGraphCheckpointer & PeriodicRDDCheckpointer ## What changes were proposed in this pull request? forgot to call `update()` with `graph1` & `rdd1` in examples for `PeriodicGraphCheckpointer` & `PeriodicRDDCheckpoin` ## How was this patch tested? existing tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #19198 from zhengruifeng/fix_doc_checkpointer.	2017-09-14 14:04:43 +08:00
Armin	b6ef1f57bc	[SPARK-21970][CORE] Fix Redundant Throws Declarations in Java Codebase ## What changes were proposed in this pull request? 1. Removing all redundant throws declarations from Java codebase. 2. Removing dead code made visible by this from `ShuffleExternalSorter#closeAndGetSpills` ## How was this patch tested? Build still passes. Author: Armin <me@obrown.io> Closes #19182 from original-brownbear/SPARK-21970.	2017-09-13 14:04:26 +01:00
caoxuewen	ca00cc70d6	[SPARK-21963][CORE][TEST] Create temp file should be delete after use ## What changes were proposed in this pull request? After you create a temporary table, you need to delete it, otherwise it will leave a file similar to the file name ‘SPARK194465907929586320484966temp’. ## How was this patch tested? N / A Author: caoxuewen <cao.xuewen@zte.com.cn> Closes #19174 from heary-cao/DeleteTempFile.	2017-09-13 13:01:30 +01:00
German Schiavon	a1d98c6dcd	[SPARK-21982] Set locale to US ## What changes were proposed in this pull request? In UtilsSuite Locale was set by default to US, but at the moment of using format function it wasn't, taking by default JVM locale which could be different than US making this test fail. ## How was this patch tested? Unit test (UtilsSuite) Author: German Schiavon <germanschiavon@gmail.com> Closes #19205 from Gschiavon/fix/test-locale.	2017-09-13 09:52:45 +01:00
caoxuewen	dc74c0e67d	[MINOR][SQL] remove unuse import class ## What changes were proposed in this pull request? this PR describe remove the import class that are unused. ## How was this patch tested? N/A Author: caoxuewen <cao.xuewen@zte.com.cn> Closes #19131 from heary-cao/unuse_import.	2017-09-11 10:09:20 +01:00
Dongjoon Hyun	c26976fe14	[SPARK-21939][TEST] Use TimeLimits instead of Timeouts Since ScalaTest 3.0.0, `org.scalatest.concurrent.Timeouts` is deprecated. This PR replaces the deprecated one with `org.scalatest.concurrent.TimeLimits`. ```scala -import org.scalatest.concurrent.Timeouts._ +import org.scalatest.concurrent.TimeLimits._ ``` Pass the existing test suites. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #19150 from dongjoon-hyun/SPARK-21939. Change-Id: I1a1b07f1b97e51e2263dfb34b7eaaa099b2ded5e	2017-09-08 09:31:13 +08:00
Sanket Chintapalli	b9ab791a9e	[SPARK-21890] Credentials not being passed to add the tokens I observed this while running a oozie job trying to connect to hbase via spark. It look like the creds are not being passed in thehttps://github.com/apache/spark/blob/branch-2.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HadoopFSCredentialProvider.scala#L53 for 2.2 release. More Info as to why it fails on secure grid: Oozie client gets the necessary tokens the application needs before launching. It passes those tokens along to the oozie launcher job (MR job) which will then actually call the Spark client to launch the spark app and pass the tokens along. The oozie launcher job cannot get anymore tokens because all it has is tokens ( you can't get tokens with tokens, you need tgt or keytab). The error here is because the launcher job runs the Spark Client to submit the spark job but the spark client doesn't see that it already has the hdfs tokens so it tries to get more, which ends with the exception. There was a change with SPARK-19021 to generalize the hdfs credentials provider that changed it so we don't pass the existing credentials into the call to get tokens so it doesn't realize it already has the necessary tokens. https://issues.apache.org/jira/browse/SPARK-21890 Modified to pass creds to get delegation tokens Author: Sanket Chintapalli <schintap@yahoo-inc.com> Closes #19140 from redsanket/SPARK-21890-master.	2017-09-07 11:25:24 -05:00
Sean Owen	ca59445adb	[SPARK-21418][SQL] NoSuchElementException: None.get in DataSourceScanExec with sun.io.serialization.extendedDebugInfo=true ## What changes were proposed in this pull request? If no SparkConf is available to Utils.redact, simply don't redact. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #19123 from srowen/SPARK-21418.	2017-09-04 23:02:59 +02:00
Sean Owen	12ab7f7e89	[SPARK-14280][BUILD][WIP] Update change-version.sh and pom.xml to add Scala 2.12 profiles and enable 2.12 compilation …build; fix some things that will be warnings or errors in 2.12; restore Scala 2.12 profile infrastructure ## What changes were proposed in this pull request? This change adds back the infrastructure for a Scala 2.12 build, but does not enable it in the release or Python test scripts. In order to make that meaningful, it also resolves compile errors that the code hits in 2.12 only, in a way that still works with 2.11. It also updates dependencies to the earliest minor release of dependencies whose current version does not yet support Scala 2.12. This is in a sense covered by other JIRAs under the main umbrella, but implemented here. The versions below still work with 2.11, and are the _latest_ maintenance release in the _earliest_ viable minor release. - Scalatest 2.x -> 3.0.3 - Chill 0.8.0 -> 0.8.4 - Clapper 1.0.x -> 1.1.2 - json4s 3.2.x -> 3.4.2 - Jackson 2.6.x -> 2.7.9 (required by json4s) This change does _not_ fully enable a Scala 2.12 build: - It will also require dropping support for Kafka before 0.10. Easy enough, just didn't do it yet here - It will require recreating `SparkILoop` and `Main` for REPL 2.12, which is SPARK-14650. Possible to do here too. What it does do is make changes that resolve much of the remaining gap without affecting the current 2.11 build. ## How was this patch tested? Existing tests and build. Manually tested with `./dev/change-scala-version.sh 2.12` to verify it compiles, modulo the exceptions above. Author: Sean Owen <sowen@cloudera.com> Closes #18645 from srowen/SPARK-14280.	2017-09-01 19:21:21 +01:00
Marcelo Vanzin	0bdbefe9dd	[SPARK-21728][CORE] Follow up: fix user config, auth in SparkSubmit logging. - SecurityManager complains when auth is enabled but no secret is defined; SparkSubmit doesn't use the auth functionality of the SecurityManager, so use a dummy secret to work around the exception. - Only reset the log4j configuration when Spark was the one initializing it, otherwise user-defined log configuration may be lost. Tested with the log config file posted to the bug, on a secured YARN cluster. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #19089 from vanzin/SPARK-21728.	2017-09-01 10:29:36 -07:00
Liang-Chi Hsieh	ecf437a648	[SPARK-21534][SQL][PYSPARK] PickleException when creating dataframe from python row with empty bytearray ## What changes were proposed in this pull request? `PickleException` is thrown when creating dataframe from python row with empty bytearray spark.createDataFrame(spark.sql("select unhex('') as xx").rdd.map(lambda x: {"abc": x.xx})).show() net.razorvine.pickle.PickleException: invalid pickle data for bytearray; expected 1 or 2 args, got 0 at net.razorvine.pickle.objects.ByteArrayConstructor.construct(ByteArrayConstructor.java ... `ByteArrayConstructor` doesn't deal with empty byte array pickled by Python3. ## How was this patch tested? Added test. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19085 from viirya/SPARK-21534.	2017-08-31 12:55:38 +09:00
Xiaofeng Lin	cd5d0f3379	[SPARK-11574][CORE] Add metrics StatsD sink This patch adds statsd sink to the current metrics system in spark core. Author: Xiaofeng Lin <xlin@twilio.com> Closes #9518 from xflin/statsd. Change-Id: Ib8720e86223d4a650df53f51ceb963cd95b49a44	2017-08-31 08:57:15 +08:00
Andrew Ash	313c6ca435	[SPARK-21875][BUILD] Fix Java style bugs ## What changes were proposed in this pull request? Fix Java code style so `./dev/lint-java` succeeds ## How was this patch tested? Run `./dev/lint-java` Author: Andrew Ash <andrew@andrewash.com> Closes #19088 from ash211/spark-21875-lint-java.	2017-08-31 09:26:11 +09:00
Sital Kedia	6949a9c5c6	[SPARK-21834] Incorrect executor request in case of dynamic allocation ## What changes were proposed in this pull request? killExecutor api currently does not allow killing an executor without updating the total number of executors needed. In case of dynamic allocation is turned on and the allocator tries to kill an executor, the scheduler reduces the total number of executors needed ( see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) which is incorrect because the allocator already takes care of setting the required number of executors itself. ## How was this patch tested? Ran a job on the cluster and made sure the executor request is correct Author: Sital Kedia <skedia@fb.com> Closes #19081 from sitalkedia/skedia/oss_fix_executor_allocation.	2017-08-30 14:19:13 -07:00
hyukjinkwon	b30a11a6ac	[SPARK-21764][TESTS] Fix tests failures on Windows: resources not being closed and incorrect paths ## What changes were proposed in this pull request? `org.apache.spark.deploy.RPackageUtilsSuite` ``` - jars without manifest return false * FAILED * (109 milliseconds) java.io.IOException: Unable to delete file: C:\projects\spark\target\tmp\1500266936418-0\dep1-c.jar ``` `org.apache.spark.deploy.SparkSubmitSuite` ``` - download one file to local * FAILED * (16 milliseconds) java.net.URISyntaxException: Illegal character in authority at index 6: s3a://C:\projects\spark\target\tmp\test2630198944759847458.jar - download list of files to local * FAILED * (0 milliseconds) java.net.URISyntaxException: Illegal character in authority at index 6: s3a://C:\projects\spark\target\tmp\test2783551769392880031.jar ``` `org.apache.spark.scheduler.ReplayListenerSuite` ``` - Replay compressed inprogress log file succeeding on partial read (156 milliseconds) Exception encountered when attempting to run a suite with class name: org.apache.spark.scheduler.ReplayListenerSuite * ABORTED * (1 second, 391 milliseconds) java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-8f3cacd6-faad-4121-b901-ba1bba8025a0 - End-to-end replay * FAILED * (62 milliseconds) java.io.IOException: No FileSystem for scheme: C - End-to-end replay with compression * FAILED * (110 milliseconds) java.io.IOException: No FileSystem for scheme: C ``` `org.apache.spark.sql.hive.StatisticsSuite` ``` - SPARK-21079 - analyze table with location different than that of individual partitions * FAILED * (875 milliseconds) org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string); - SPARK-21079 - analyze partitioned table with only a subset of partitions visible * FAILED * (47 milliseconds) org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string); ``` Note: this PR does not fix: `org.apache.spark.deploy.SparkSubmitSuite` ``` - launch simple application with spark-submit with redaction * FAILED * (172 milliseconds) java.util.NoSuchElementException: next on empty iterator ``` I can't reproduce this on my Windows machine but looks appearntly consistently failed on AppVeyor. This one is unclear to me yet and hard to debug so I did not include this one for now. Note: it looks there are more instances but it is hard to identify them partly due to flakiness and partly due to swarming logs and errors. Will probably go one more time if it is fine. ## How was this patch tested? Manually via AppVeyor: Before - `org.apache.spark.deploy.RPackageUtilsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/8t8ra3lrljuir7q4 - `org.apache.spark.deploy.SparkSubmitSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/taquy84yudjjen64 - `org.apache.spark.scheduler.ReplayListenerSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/24omrfn2k0xfa9xq - `org.apache.spark.sql.hive.StatisticsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/771-windows-fix/job/2079y1plgj76dc9l After - `org.apache.spark.deploy.RPackageUtilsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/3803dbfn89ne1164 - `org.apache.spark.deploy.SparkSubmitSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/m5l350dp7u9a4xjr - `org.apache.spark.scheduler.ReplayListenerSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/565vf74pp6bfdk18 - `org.apache.spark.sql.hive.StatisticsSuite`: https://ci.appveyor.com/project/spark-test/spark/build/775-windows-fix/job/qm78tsk8c37jb6s4 Jenkins tests are required and AppVeyor tests will be triggered. Author: hyukjinkwon <gurwls223@gmail.com> Closes #18971 from HyukjinKwon/windows-fixes.	2017-08-30 21:35:52 +09:00
liuxian	d4895c9de6	[MINOR][TEST] Off -heap memory leaks for unit tests ## What changes were proposed in this pull request? Free off -heap memory . I have checked all the unit tests. ## How was this patch tested? N/A Author: liuxian <liu.xian3@zte.com.cn> Closes #19075 from 10110346/memleak.	2017-08-30 10:16:11 +01:00
Steve Loughran	e47f48c737	[SPARK-20886][CORE] HadoopMapReduceCommitProtocol to handle FileOutputCommitter.getWorkPath==null ## What changes were proposed in this pull request? Handles the situation where a `FileOutputCommitter.getWorkPath()` returns `null` by downgrading to the supplied `path` argument. The existing code does an `Option(workPath.toString).getOrElse(path)`, which triggers an NPE in the `toString()` operation if the workPath == null. The code apparently was meant to handle this (hence the getOrElse() clause, but as the NPE has already occurred at that point the else-clause never gets invoked. ## How was this patch tested? Manually, with some later code review. Author: Steve Loughran <stevel@hortonworks.com> Closes #18111 from steveloughran/cloud/SPARK-20886-committer-NPE.	2017-08-30 13:03:30 +09:00
he.qiao	fba9cc8466	[SPARK-21813][CORE] Modify TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES comments ## What changes were proposed in this pull request? The variable "TaskMemoryManager.MAXIMUM_PAGE_SIZE_BYTES" comment error, It shouldn't be 2^32-1, should be 2^31-1, That means the maximum value of int. ## How was this patch tested? Existing test cases Author: he.qiao <he.qiao17@zte.com.cn> Closes #19025 from Geek-He/08_23_comments.	2017-08-29 23:44:27 +01:00
Marcelo Vanzin	d7b1fcf8f0	[SPARK-21728][CORE] Allow SparkSubmit to use Logging. This change initializes logging when SparkSubmit runs, using a configuration that should avoid printing log messages as much as possible with most configurations, and adds code to restore the Spark logging system to as close as possible to its initial state, so the Spark app being run can re-initialize logging with its own configuration. With that feature, some duplicate code in SparkSubmit can now be replaced with the existing methods in the Utils class, which could not be used before because they initialized logging. As part of that I also did some minor refactoring, moving methods that should really belong in DependencyUtils. The change also shuffles some code in SparkHadoopUtil so that SparkSubmit can create a Hadoop config like the rest of Spark code, respecting the user's Spark configuration. The behavior was verified running spark-shell, pyspark and normal applications, then verifying the logging behavior, with and without dependency downloads. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #19013 from vanzin/SPARK-21728.	2017-08-29 14:42:24 -07:00
erenavsarogullari	73e64f7d50	[SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit Test coverage for different build cases ## What changes were proposed in this pull request? Fair Scheduler can be built via one of the following options: - By setting a `spark.scheduler.allocation.file` property, - By setting `fairscheduler.xml` into classpath. These options are checked in order and fair-scheduler is built via first found option. If invalid path is found, `FileNotFoundException` will be expected. This PR aims unit test coverage of these use cases and a minor documentation change has been added for second option(`fairscheduler.xml` into classpath) to inform the users. Also, this PR was related with #16813 and has been created separately to keep patch content as isolated and to help the reviewers. ## How was this patch tested? Added new Unit Tests. Author: erenavsarogullari <erenavsarogullari@gmail.com> Closes #16992 from erenavsarogullari/SPARK-19662.	2017-08-28 14:54:00 -05:00
jerryshao	1813c4a8dd	[SPARK-21714][CORE][YARN] Avoiding re-uploading remote resources in yarn client mode ## What changes were proposed in this pull request? With SPARK-10643, Spark supports download resources from remote in client deploy mode. But the implementation overrides variables which representing added resources (like `args.jars`, `args.pyFiles`) to local path, And yarn client leverage this local path to re-upload resources to distributed cache. This is unnecessary to break the semantics of putting resources in a shared FS. So here proposed to fix it. ## How was this patch tested? This is manually verified with jars, pyFiles in local and remote storage, both in client and cluster mode. Author: jerryshao <sshao@hortonworks.com> Closes #18962 from jerryshao/SPARK-21714.	2017-08-25 09:57:53 -07:00
Sean Owen	de7af295c2	[MINOR][BUILD] Fix build warnings and Java lint errors ## What changes were proposed in this pull request? Fix build warnings and Java lint errors. This just helps a bit in evaluating (new) warnings in another PR I have open. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #19051 from srowen/JavaWarnings.	2017-08-25 16:07:13 +01:00
zhoukang	574ef6c987	[SPARK-21527][CORE] Use buffer limit in order to use JAVA NIO Util's buffercache ## What changes were proposed in this pull request? Right now, ChunkedByteBuffer#writeFully do not slice bytes first.We observe code in java nio Util#getTemporaryDirectBuffer below: BufferCache cache = bufferCache.get(); ByteBuffer buf = cache.get(size); if (buf != null) { return buf; } else { // No suitable buffer in the cache so we need to allocate a new // one. To avoid the cache growing then we remove the first // buffer from the cache and free it. if (!cache.isEmpty()) { buf = cache.removeFirst(); free(buf); } return ByteBuffer.allocateDirect(size); } If we slice first with a fixed size, we can use buffer cache and only need to allocate at the first write call. Since we allocate new buffer, we can not control the free time of this buffer.This once cause memory issue in our production cluster. In this patch, i supply a new api which will slice with fixed size for buffer writing. ## How was this patch tested? Unit test and test in production. Author: zhoukang <zhoukang199191@gmail.com> Author: zhoukang <zhoukang@xiaomi.com> Closes #18730 from caneGuy/zhoukang/improve-chunkwrite.	2017-08-25 22:59:31 +08:00
Sanket Chintapalli	1662e93119	[SPARK-21501] Change CacheLoader to limit entries based on memory footprint Right now the spark shuffle service has a cache for index files. It is based on a # of files cached (spark.shuffle.service.index.cache.entries). This can cause issues if people have a lot of reducers because the size of each entry can fluctuate based on the # of reducers. We saw an issues with a job that had 170000 reducers and it caused NM with spark shuffle service to use 700-800MB or memory in NM by itself. We should change this cache to be memory based and only allow a certain memory size used. When I say memory based I mean the cache should have a limit of say 100MB. https://issues.apache.org/jira/browse/SPARK-21501 Manual Testing with 170000 reducers has been performed with cache loaded up to max 100MB default limit, with each shuffle index file of size 1.3MB. Eviction takes place as soon as the total cache size reaches the 100MB limit and the objects will be ready for garbage collection there by avoiding NM to crash. No notable difference in runtime has been observed. Author: Sanket Chintapalli <schintap@yahoo-inc.com> Closes #18940 from redsanket/SPARK-21501.	2017-08-23 11:51:11 -05:00
Jane Wang	d58a3507ed	[SPARK-19326] Speculated task attempts do not get launched in few scenarios ## What changes were proposed in this pull request? Add a new listener event when a speculative task is created and notify it to ExecutorAllocationManager for requesting more executor. ## How was this patch tested? - Added Unittests. - For the test snippet in the jira: val n = 100 val someRDD = sc.parallelize(1 to n, n) someRDD.mapPartitionsWithIndex( (index: Int, it: Iterator[Int]) => { if (index == 1) { Thread.sleep(Long.MaxValue) // fake long running task(s) } it.toList.map(x => index + ", " + x).iterator }).collect With this code change, spark indicates 101 jobs are running (99 succeeded, 2 running and 1 is speculative job) Author: Jane Wang <janewang@fb.com> Closes #18492 from janewangfb/speculated_task_not_launched.	2017-08-23 11:31:54 +08:00
jerryshao	3ed1ae1005	[SPARK-20641][CORE] Add missing kvstore module in Laucher and SparkSubmit code There're two code in Launcher and SparkSubmit will will explicitly list all the Spark submodules, newly added kvstore module is missing in this two parts, so submitting a minor PR to fix this. Author: jerryshao <sshao@hortonworks.com> Closes #19014 from jerryshao/missing-kvstore.	2017-08-22 10:14:45 -07:00
Sergey Serebryakov	77d046ec47	[SPARK-21782][CORE] Repartition creates skews when numPartitions is a power of 2 ## Problem When an RDD (particularly with a low item-per-partition ratio) is repartitioned to numPartitions = power of 2, the resulting partitions are very uneven-sized, due to using fixed seed to initialize PRNG, and using the PRNG only once. See details in https://issues.apache.org/jira/browse/SPARK-21782 ## What changes were proposed in this pull request? Instead of directly using `0, 1, 2,...` seeds to initialize `Random`, hash them with `scala.util.hashing.byteswap32()`. ## How was this patch tested? `build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite test` Author: Sergey Serebryakov <sserebryakov@tesla.com> Closes #18990 from megaserg/repartition-skew.	2017-08-21 08:21:25 +01:00
ArtRand	bfdc361ede	[SPARK-16742] Mesos Kerberos Support ## What changes were proposed in this pull request? Add Kerberos Support to Mesos. This includes kinit and --keytab support, but does not include delegation token renewal. ## How was this patch tested? Manually against a Secure DC/OS Apache HDFS cluster. Author: ArtRand <arand@soe.ucsc.edu> Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #18519 from mgummelt/SPARK-16742-kerberos.	2017-08-17 15:47:07 -07:00
Kent Yao	b83b502c41	[SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState ## What changes were proposed in this pull request? Set isolated to false while using builtin hive jars and `SessionState.get` returns a `CliSessionState` instance. ## How was this patch tested? 1 Unit Tests 2 Manually verified: `hive.exec.strachdir` was only created once because of reusing cliSessionState ```java ➜ spark git:(SPARK-21428) ✗ bin/spark-sql --conf spark.sql.hive.metastore.jars=builtin log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/07/16 23:59:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/16 23:59:27 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 17/07/16 23:59:27 INFO ObjectStore: ObjectStore, initialize called 17/07/16 23:59:28 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 17/07/16 23:59:28 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored 17/07/16 23:59:29 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 17/07/16 23:59:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/07/16 23:59:30 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/07/16 23:59:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 17/07/16 23:59:31 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 17/07/16 23:59:31 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY 17/07/16 23:59:31 INFO ObjectStore: Initialized ObjectStore 17/07/16 23:59:31 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 17/07/16 23:59:31 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 17/07/16 23:59:32 INFO HiveMetaStore: Added admin role in metastore 17/07/16 23:59:32 INFO HiveMetaStore: Added public role in metastore 17/07/16 23:59:32 INFO HiveMetaStore: No user is added in admin role, since config is empty 17/07/16 23:59:32 INFO HiveMetaStore: 0: get_all_databases 17/07/16 23:59:32 INFO audit: ugi=Kent ip=unknown-ip-addr cmd=get_all_databases 17/07/16 23:59:32 INFO HiveMetaStore: 0: get_functions: db=default pat=* 17/07/16 23:59:32 INFO audit: ugi=Kent ip=unknown-ip-addr cmd=get_functions: db=default pat=* 17/07/16 23:59:32 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table. 17/07/16 23:59:32 INFO SessionState: Created local directory: /var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/beea7261-221a-4711-89e8-8b12a9d37370_resources 17/07/16 23:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/Kent/beea7261-221a-4711-89e8-8b12a9d37370 17/07/16 23:59:32 INFO SessionState: Created local directory: /var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/Kent/beea7261-221a-4711-89e8-8b12a9d37370 17/07/16 23:59:32 INFO SessionState: Created HDFS directory: /tmp/hive/Kent/beea7261-221a-4711-89e8-8b12a9d37370/_tmp_space.db 17/07/16 23:59:32 INFO SparkContext: Running Spark version 2.3.0-SNAPSHOT 17/07/16 23:59:32 INFO SparkContext: Submitted application: SparkSQL::10.0.0.8 17/07/16 23:59:32 INFO SecurityManager: Changing view acls to: Kent 17/07/16 23:59:32 INFO SecurityManager: Changing modify acls to: Kent 17/07/16 23:59:32 INFO SecurityManager: Changing view acls groups to: 17/07/16 23:59:32 INFO SecurityManager: Changing modify acls groups to: 17/07/16 23:59:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Kent); groups with view permissions: Set(); users with modify permissions: Set(Kent); groups with modify permissions: Set() 17/07/16 23:59:33 INFO Utils: Successfully started service 'sparkDriver' on port 51889. 17/07/16 23:59:33 INFO SparkEnv: Registering MapOutputTracker 17/07/16 23:59:33 INFO SparkEnv: Registering BlockManagerMaster 17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/07/16 23:59:33 INFO DiskBlockManager: Created local directory at /private/var/folders/k2/04p4k4ws73l6711h_mz2_tq00000gn/T/blockmgr-9cfae28a-01e9-4c73-a1f1-f76fa52fc7a5 17/07/16 23:59:33 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 17/07/16 23:59:33 INFO SparkEnv: Registering OutputCommitCoordinator 17/07/16 23:59:33 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/07/16 23:59:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.0.8:4040 17/07/16 23:59:33 INFO Executor: Starting executor ID driver on host localhost 17/07/16 23:59:33 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51890. 17/07/16 23:59:33 INFO NettyBlockTransferService: Server created on 10.0.0.8:51890 17/07/16 23:59:33 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/07/16 23:59:33 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.0.8, 51890, None) 17/07/16 23:59:33 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.0.8:51890 with 366.3 MB RAM, BlockManagerId(driver, 10.0.0.8, 51890, None) 17/07/16 23:59:33 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.0.8, 51890, None) 17/07/16 23:59:33 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.0.8, 51890, None) 17/07/16 23:59:34 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/Kent/Documents/spark/spark-warehouse'). 17/07/16 23:59:34 INFO SharedState: Warehouse path is 'file:/Users/Kent/Documents/spark/spark-warehouse'. 17/07/16 23:59:34 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse 17/07/16 23:59:34 INFO HiveMetaStore: 0: get_database: default 17/07/16 23:59:34 INFO audit: ugi=Kent ip=unknown-ip-addr cmd=get_database: default 17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse 17/07/16 23:59:34 INFO HiveMetaStore: 0: get_database: global_temp 17/07/16 23:59:34 INFO audit: ugi=Kent ip=unknown-ip-addr cmd=get_database: global_temp 17/07/16 23:59:34 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException 17/07/16 23:59:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse 17/07/16 23:59:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint spark-sql> ``` cc cloud-fan gatorsmile Author: Kent Yao <yaooqinn@hotmail.com> Author: hzyaoqin <hzyaoqin@corp.netease.com> Closes #18648 from yaooqinn/SPARK-21428.	2017-08-18 00:24:45 +08:00
Hideaki Tanaka	d695a528be	[SPARK-21642][CORE] Use FQDN for DRIVER_HOST_ADDRESS instead of ip address ## What changes were proposed in this pull request? The patch lets spark web ui use FQDN as its hostname instead of ip address. In current implementation, ip address of a driver host is set to DRIVER_HOST_ADDRESS. This becomes a problem when we enable SSL using "spark.ssl.enabled", "spark.ssl.trustStore" and "spark.ssl.keyStore" properties. When we configure these properties, spark web ui is launched with SSL enabled and the HTTPS server is configured with the custom SSL certificate you configured in these properties. In this case, client gets javax.net.ssl.SSLPeerUnverifiedException exception when the client accesses the spark web ui because the client fails to verify the SSL certificate (Common Name of the SSL cert does not match with DRIVER_HOST_ADDRESS). To avoid the exception, we should use FQDN of the driver host for DRIVER_HOST_ADDRESS. Error message that client gets when the client accesses spark web ui: javax.net.ssl.SSLPeerUnverifiedException: Certificate for <10.102.138.239> doesn't match any of the subject alternative names: [] ## How was this patch tested? manual tests Author: Hideaki Tanaka <tanakah@amazon.com> Closes #18846 from thideeeee/SPARK-21642.	2017-08-17 22:02:13 +08:00
Eyal Farago	b8ffb51055	[SPARK-3151][BLOCK MANAGER] DiskStore.getBytes fails for files larger than 2GB ## What changes were proposed in this pull request? introduced `DiskBlockData`, a new implementation of `BlockData` representing a whole file. this is somehow related to [SPARK-6236](https://issues.apache.org/jira/browse/SPARK-6236) as well This class follows the implementation of `EncryptedBlockData` just without the encryption. hence: * `toInputStream` is implemented using a `FileInputStream` (todo: encrypted version actually uses `Channels.newInputStream`, not sure if it's the right choice for this) * `toNetty` is implemented in terms of `io.netty.channel.DefaultFileRegion` * `toByteBuffer` fails for files larger than 2GB (same behavior of the original code, just postponed a bit), it also respects the same configuration keys defined by the original code to choose between memory mapping and simple file read. ## How was this patch tested? added test to DiskStoreSuite and MemoryManagerSuite Author: Eyal Farago <eyal@nrgene.com> Closes #18855 from eyalfa/SPARK-3151.	2017-08-17 09:21:50 +08:00
John Lee	adf005dabe	[SPARK-21656][CORE] spark dynamic allocation should not idle timeout executors when tasks still to run ## What changes were proposed in this pull request? Right now spark lets go of executors when they are idle for the 60s (or configurable time). I have seen spark let them go when they are idle but they were really needed. I have seen this issue when the scheduler was waiting to get node locality but that takes longer than the default idle timeout. In these jobs the number of executors goes down really small (less than 10) but there are still like 80,000 tasks to run. We should consider not allowing executors to idle timeout if they are still needed according to the number of tasks to be run. ## How was this patch tested? Tested by manually adding executors to `executorsIdsToBeRemoved` list and seeing if those executors were removed when there are a lot of tasks and a high `numExecutorsTarget` value. Code used In `ExecutorAllocationManager.start()` ``` start_time = clock.getTimeMillis() ``` In `ExecutorAllocationManager.schedule()` ``` val executorIdsToBeRemoved = ArrayBuffer[String]() if ( now > start_time + 1000 * 60 * 2) { logInfo("--- REMOVING 1/2 of the EXECUTORS ---") start_time += 1000 * 60 * 100 var counter = 0 for (x <- executorIds) { counter += 1 if (counter == 2) { counter = 0 executorIdsToBeRemoved += x } } } Author: John Lee <jlee2@yahoo-inc.com> Closes #18874 from yoonlee95/SPARK-21656.	2017-08-16 09:44:09 -05:00
Marcelo Vanzin	3f958a9992	[SPARK-21731][BUILD] Upgrade scalastyle to 0.9. This version fixes a few issues in the import order checker; it provides better error messages, and detects more improper ordering (thus the need to change a lot of files in this patch). The main fix is that it correctly complains about the order of packages vs. classes. As part of the above, I moved some "SparkSession" import in ML examples inside the "$example on$" blocks; that didn't seem consistent across different source files to start with, and avoids having to add more on/off blocks around specific imports. The new scalastyle also seems to have a better header detector, so a few license headers had to be updated to match the expected indentation. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18943 from vanzin/SPARK-21731.	2017-08-15 13:59:00 -07:00
Marcelo Vanzin	cba826d001	[SPARK-17742][CORE] Handle child process exit in SparkLauncher. Currently the launcher handle does not monitor the child spark-submit process it launches; this means that if the child exits with an error, the handle's state will never change, and an application will not know that the application has failed. This change adds code to monitor the child process, and changes the handle state appropriately when the child process exits. Tested with added unit tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #18877 from vanzin/SPARK-17742.	2017-08-15 11:26:29 -07:00
Andrew Ash	6847e93cf4	[SPARK-21563][CORE] Fix race condition when serializing TaskDescriptions and adding jars ## What changes were proposed in this pull request? Fix the race condition when serializing TaskDescriptions and adding jars by keeping the set of jars and files for a TaskSet constant across the lifetime of the TaskSet. Otherwise TaskDescription serialization can produce an invalid serialization when new file/jars are added concurrently as the TaskDescription is serialized. ## How was this patch tested? Additional unit test ensures jars/files contained in the TaskDescription remain constant throughout the lifetime of the TaskSet. Author: Andrew Ash <andrew@andrewash.com> Closes #18913 from ash211/SPARK-21563.	2017-08-14 22:48:08 +08:00
Anderson Osagie	34d2134a9f	[SPARK-21176][WEB UI] Format worker page links to work with proxy ## What changes were proposed in this pull request? Several links on the worker page do not work correctly with the proxy because: 1) They don't acknowledge the proxy 2) They use relative paths (unlike the Application Page which uses full paths) This patch fixes that. It also fixes a mistake in the proxy's Location header parsing which caused it to incorrectly handle redirects. ## How was this patch tested? I checked the validity of every link with the proxy on and off. Author: Anderson Osagie <osagie@gmail.com> Closes #18915 from aosagie/fix/proxy-links.	2017-08-14 10:00:59 +01:00

1 2 3 4 5 ...

6360 commits