Commit graph

308 commits

Author SHA1 Message Date
Marcelo Vanzin 8f3f73abc1 [SPARK-19139][CORE] New auth mechanism for transport library.
This change introduces a new auth mechanism to the transport library,
to be used when users enable strong encryption. This auth mechanism
has better security than the currently used DIGEST-MD5.

The new protocol uses symmetric key encryption to mutually authenticate
the endpoints, and is very loosely based on ISO/IEC 9798.

The new protocol falls back to SASL when it thinks the remote end is old.
Because SASL does not support asking the server for multiple auth protocols,
which would mean we could re-use the existing SASL code by just adding a
new SASL provider, the protocol is implemented outside of the SASL API
to avoid the boilerplate of adding a new provider.

Details of the auth protocol are discussed in the included README.md
file.

This change partly undos the changes added in SPARK-13331; AES encryption
is now decoupled from SASL authentication. The encryption code itself,
though, has been re-used as part of this change.

## How was this patch tested?

- Unit tests
- Tested Spark 2.2 against Spark 1.6 shuffle service with SASL enabled
- Tested Spark 2.2 against Spark 2.2 shuffle service with SASL fallback disabled

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16521 from vanzin/SPARK-19139.
2017-01-24 10:44:04 -08:00
uncleGen 7c61c2a1c4
[DOCS] Fix typo in docs
## What changes were proposed in this pull request?

Fix typo in docs

## How was this patch tested?

Author: uncleGen <hustyugm@gmail.com>

Closes #16658 from uncleGen/typo-issue.
2017-01-24 11:32:11 +00:00
Yuming Wang c99492141b
[SPARK-19146][CORE] Drop more elements when stageData.taskData.size > retainedTasks
## What changes were proposed in this pull request?

Drop more elements when `stageData.taskData.size > retainedTasks` to reduce the number of times on call drop function.

## How was this patch tested?

Jenkins

Author: Yuming Wang <wgyumg@gmail.com>

Closes #16527 from wangyum/SPARK-19146.
2017-01-23 11:02:22 +00:00
Bryan Cutler 3bc2eff888 [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts
## What changes were proposed in this pull request?

Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.

This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
## How was this patch tested?

Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.

Author: Bryan Cutler <cutlerb@gmail.com>
Author: Ian Hummel <ian@themodernlife.net>

Closes #15119 from BryanCutler/spark-custom-IvySettings.
2017-01-11 11:57:38 -08:00
Sean Owen 54138f6e89
[SPARK-19106][DOCS] Styling for the configuration docs is broken
## What changes were proposed in this pull request?

configuration.html section headings were not specified correctly in markdown and weren't rendering, being recognized correctly. Removed extra p tags and pulled level 4 titles up to level 3, since level 3 had been skipped. This improves the TOC.

## How was this patch tested?

Doc build, manual check.

Author: Sean Owen <sowen@cloudera.com>

Closes #16490 from srowen/SPARK-19106.
2017-01-07 19:15:51 +00:00
Yuexin Zhang 28ab0ec49f
[SPARK-19006][DOCS] mention spark.kryoserializer.buffer.max must be less than 2048m in doc
## What changes were proposed in this pull request?

On configuration doc page:https://spark.apache.org/docs/latest/configuration.html
We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo serialization buffer. This must be larger than any object you attempt to serialize. Increase this if you get a "buffer limit exceeded" exception inside Kryo.
from source code, it has hard coded upper limit :
```
val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", "64m").toInt
if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2))
{ throw new IllegalArgumentException("spark.kryoserializer.buffer.max must be less than " + s"2048 mb, got: + $maxBufferSizeMb mb.") }
```
We should mention "this value must be less than 2048 mb" on the configuration doc page as well.

## How was this patch tested?

None. Since it's minor doc change.

Author: Yuexin Zhang <yxzhang@cloudera.com>

Closes #16412 from cnZach/SPARK-19006.
2016-12-27 20:29:45 +00:00
Josh Rosen fa829ce21f [SPARK-18761][CORE] Introduce "task reaper" to oversee task killing in executors
## What changes were proposed in this pull request?

Spark's current task cancellation / task killing mechanism is "best effort" because some tasks may not be interruptible or may not respond to their "killed" flags being set. If a significant fraction of a cluster's task slots are occupied by tasks that have been marked as killed but remain running then this can lead to a situation where new jobs and tasks are starved of resources that are being used by these zombie tasks.

This patch aims to address this problem by adding a "task reaper" mechanism to executors. At a high-level, task killing now launches a new thread which attempts to kill the task and then watches the task and periodically checks whether it has been killed. The TaskReaper will periodically re-attempt to call `TaskRunner.kill()` and will log warnings if the task keeps running. I modified TaskRunner to rename its thread at the start of the task, allowing TaskReaper to take a thread dump and filter it in order to log stacktraces from the exact task thread that we are waiting to finish. If the task has not stopped after a configurable timeout then the TaskReaper will throw an exception to trigger executor JVM death, thereby forcibly freeing any resources consumed by the zombie tasks.

This feature is flagged off by default and is controlled by four new configurations under the `spark.task.reaper.*` namespace. See the updated `configuration.md` doc for details.

## How was this patch tested?

Tested via a new test case in `JobCancellationSuite`, plus manual testing.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #16189 from JoshRosen/cancellation.
2016-12-19 18:43:59 -08:00
gatorsmile c0c9e1d27a
[SPARK-18918][DOC] Missing </td> in Configuration page
### What changes were proposed in this pull request?
The configuration page looks messy now, as shown in the nightly build:
https://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/configuration.html

Starting from the following location:

![screenshot 2016-12-18 00 26 33](https://cloud.githubusercontent.com/assets/11567269/21292396/ace4719c-c4b8-11e6-8dfd-d9ab95be43d5.png)

### How was this patch tested?
Attached is the screenshot generated in my local computer after the fix.
[Configuration - Spark 2.2.0 Documentation.pdf](https://github.com/apache/spark/files/659315/Configuration.-.Spark.2.2.0.Documentation.pdf)

Author: gatorsmile <gatorsmile@gmail.com>

Closes #16327 from gatorsmile/docFix.
2016-12-18 09:02:04 +00:00
Imran Rashid 93cdb8a7d0 [SPARK-8425][CORE] Application Level Blacklisting
## What changes were proposed in this pull request?

This builds upon the blacklisting introduced in SPARK-17675 to add blacklisting of executors and nodes for an entire Spark application.  Resources are blacklisted based on tasks that fail, in tasksets that eventually complete successfully; they are automatically returned to the pool of active resources based on a timeout.  Full details are available in a design doc attached to the jira.
## How was this patch tested?

Added unit tests, ran them via Jenkins, also ran a handful of them in a loop to check for flakiness.

The added tests include:
- verifying BlacklistTracker works correctly
- verifying TaskSchedulerImpl interacts with BlacklistTracker correctly (via a mock BlacklistTracker)
- an integration test for the entire scheduler with blacklisting in a few different scenarios

Author: Imran Rashid <irashid@cloudera.com>
Author: mwws <wei.mao@intel.com>

Closes #14079 from squito/blacklist-SPARK-8425.
2016-12-15 08:29:56 -06:00
Marcelo Vanzin bc59951bab [SPARK-18773][CORE] Make commons-crypto config translation consistent.
This change moves the logic that translates Spark configuration to
commons-crypto configuration to the network-common module. It also
extends TransportConf and ConfigProvider to provide the necessary
interfaces for the translation to work.

As part of the change, I removed SystemPropertyConfigProvider, which
was mostly used as an "empty config" in unit tests, and adjusted the
very few tests that required a specific config.

I also changed the config keys for AES encryption to live under the
"spark.network." namespace, which is more correct than their previous
names under "spark.authenticate.".

Tested via existing unit test.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16200 from vanzin/SPARK-18773.
2016-12-12 16:27:04 -08:00
Nicholas Chammas 18eaabb71e [SPARK-18719] Add spark.ui.showConsoleProgress to configuration docs
This PR adds `spark.ui.showConsoleProgress` to the configuration docs.

I tested this PR by building the docs locally and confirming that this change shows up as expected.

Relates to #3029.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #16151 from nchammas/ui-progressbar-doc.
2016-12-05 14:40:50 -08:00
Marcelo Vanzin 8b325b17ec [SPARK-18547][CORE] Propagate I/O encryption key when executors register.
This change modifies the method used to propagate encryption keys used during
shuffle. Instead of relying on YARN's UserGroupInformation credential propagation,
this change explicitly distributes the key using the messages exchanged between
driver and executor during registration. When RPC encryption is enabled, this means
key propagation is also secure.

This allows shuffle encryption to work in non-YARN mode, which means that it's
easier to write unit tests for areas of the code that are affected by the feature.

The key is stored in the SecurityManager; because there are many instances of
that class used in the code, the key is only guaranteed to exist in the instance
managed by the SparkEnv. This path was chosen to avoid storing the key in the
SparkConf, which would risk having the key being written to disk as part of the
configuration (as, for example, is done when starting YARN applications).

Tested by new and existing unit tests (which were moved from the YARN module to
core), and by running apps with shuffle encryption enabled.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #15981 from vanzin/SPARK-18547.
2016-11-28 21:10:57 -08:00
Mark Grover 237c3b9642 [SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI
## What changes were proposed in this pull request?

This patch adds a new property called `spark.secret.redactionPattern` that
allows users to specify a scala regex to decide which Spark configuration
properties and environment variables in driver and executor environments
contain sensitive information. When this regex matches the property or
environment variable name, its value is redacted from the environment UI and
various logs like YARN and event logs.

This change uses this property to redact information from event logs and YARN
logs. It also, updates the UI code to adhere to this property instead of
hardcoding the logic to decipher which properties are sensitive.

Here's an image of the UI post-redaction:
![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png)

Here's the text in the YARN logs, post-redaction:
``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)``

Here's the text in the event logs, post-redaction:
``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...``

## How was this patch tested?
1. Unit tests are added to ensure that redaction works.
2. A YARN job reading data off of S3 with confidential information
(hadoop credential provider password) being provided in the environment
variables of driver and executor. And, afterwards, logs were grepped to make
sure that no mention of secret password was present. It was also ensure that
the job was able to read the data off of S3 correctly, thereby ensuring that
the sensitive information was being trickled down to the right places to read
the data.
3. The event logs were checked to make sure no mention of secret password was
present.
4. UI environment tab was checked to make sure there was no secret information
being displayed.

Author: Mark Grover <mark@apache.org>

Closes #15971 from markgrover/master_redaction.
2016-11-28 08:59:47 -08:00
Sean Owen 8b1e1088eb
[SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s
## What changes were proposed in this pull request?

Avoid hard-coding spark.rpc.askTimeout to non-default in Client; fix doc about spark.rpc.askTimeout default

## How was this patch tested?

Existing tests

Author: Sean Owen <sowen@cloudera.com>

Closes #15833 from srowen/SPARK-18353.
2016-11-19 11:28:25 +00:00
Weiqing Yang 241e04bc03
[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation
## What changes were proposed in this pull request?

Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation.

## How was this patch tested?
Manually.

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15886 from weiqingy/fixTypo.
2016-11-16 10:34:56 +00:00
Weiqing Yang 3af894511b [SPARK-16759][CORE] Add a configuration property to pass caller contexts of upstream applications into Spark
## What changes were proposed in this pull request?

Many applications take Spark as a computing engine and run on it. This PR adds a configuration property `spark.log.callerContext` that can be used by Spark's upstream applications (e.g. Oozie) to set up their caller contexts into Spark. In the end, Spark will combine its own caller context with the caller contexts of its upstream applications, and write them into Yarn RM log and HDFS audit log.

The audit log has a config to truncate the caller contexts passed in (default 128). The caller contexts will be sent over rpc, so it should be concise. The call context written into HDFS log and Yarn log consists of two parts: the information `A` specified by Spark itself and the value `B` of `spark.log.callerContext` property.  Currently `A` typically takes 64 to 74 characters,  so `B` can have up to 50 characters (mentioned in the doc `running-on-yarn.md`)
## How was this patch tested?

Manual tests. I have run some Spark applications with `spark.log.callerContext` configuration in Yarn client/cluster mode, and verified that the caller contexts were written into Yarn RM log and HDFS audit log correctly.

The ways to configure `spark.log.callerContext` property:
- In spark-defaults.conf:

```
spark.log.callerContext  infoSpecifiedByUpstreamApp
```
- In app's source code:

```
val spark = SparkSession
      .builder
      .appName("SparkKMeans")
      .config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")
      .getOrCreate()
```

When running on Spark Yarn cluster mode, the driver is unable to pass 'spark.log.callerContext' to Yarn client and AM since Yarn client and AM have already started before the driver performs `.config("spark.log.callerContext", "infoSpecifiedByUpstreamApp")`.

The following  example shows the command line used to submit a SparkKMeans application and the corresponding records in Yarn RM log and HDFS audit log.

Command:

```
./bin/spark-submit --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar hdfs://localhost:9000/lr_big.txt 2 5
```

Yarn RM log:

<img width="1440" alt="screen shot 2016-10-19 at 9 12 03 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547050/7d2f278c-9649-11e6-9df8-8d5ff12609f0.png">

HDFS audit log:

<img width="1400" alt="screen shot 2016-10-19 at 10 18 14 pm" src="https://cloud.githubusercontent.com/assets/8546874/19547102/096060ae-964a-11e6-981a-cb28efd5a058.png">

Author: Weiqing Yang <yangweiqing001@gmail.com>

Closes #15563 from weiqingy/SPARK-16759.
2016-11-11 18:36:23 -08:00
Junjie Chen 4f15d94cfe [SPARK-13331] AES support for over-the-wire encryption
## What changes were proposed in this pull request?

DIGEST-MD5 mechanism is used for SASL authentication and secure communication. DIGEST-MD5 mechanism supports 3DES, DES, and RC4 ciphers. However, 3DES, DES and RC4 are slow relatively.

AES provide better performance and security by design and is a replacement for 3DES according to NIST. Apache Common Crypto is a cryptographic library optimized with AES-NI, this patch employ Apache Common Crypto as enc/dec backend for SASL authentication and secure channel to improve spark RPC.
## How was this patch tested?

Unit tests and Integration test.

Author: Junjie Chen <junjie.j.chen@intel.com>

Closes #15172 from cjjnjust/shuffle_rpc_encrypt.
2016-11-11 10:37:58 -08:00
fidato 6f3697136a [SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles
## What changes were proposed in this pull request?

This Pull request comprises of the critical bug SPARK-16575 changes. This change rectifies the issue with BinaryFileRDD partition calculations as  upon creating an RDD with sc.binaryFiles, the resulting RDD always just consisted of two partitions only.
## How was this patch tested?

The original issue ie. getNumPartitions on binary Files RDD (always having two partitions) was first replicated and then tested upon the changes. Also the unit tests have been checked and passed.

This contribution is my original work and I licence the work to the project under the project's open source license

srowen hvanhovell rxin vanzin skyluc kmader zsxwing datafarmer Please have a look .

Author: fidato <fidato.july13@gmail.com>

Closes #15327 from fidato13/SPARK-16575.
2016-11-07 18:41:17 -08:00
Josh Rosen 6e6298154a [SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Server
In SPARK-4761 / #3621 (December 2014) we enabled Kryo serialization by default in the Spark Thrift Server. However, I don't think that the original rationale for doing this still holds now that most Spark SQL serialization is now performed via encoders and our UnsafeRow format.

In addition, the use of Kryo as the default serializer can introduce performance problems because the creation of new KryoSerializer instances is expensive and we haven't performed instance-reuse optimizations in several code paths (including DirectTaskResult deserialization).

Given all of this, I propose to revert back to using JavaSerializer as the default serializer in the Thrift Server.

/cc liancheng

Author: Josh Rosen <joshrosen@databricks.com>

Closes #14906 from JoshRosen/disable-kryo-in-thriftserver.
2016-11-01 16:23:47 -07:00
Hossein 2881a2d1d1 [SPARK-17919] Make timeout to RBackend configurable in SparkR
## What changes were proposed in this pull request?

This patch makes RBackend connection timeout configurable by user.

## How was this patch tested?
N/A

Author: Hossein <hossein@databricks.com>

Closes #15471 from falaki/SPARK-17919.
2016-10-30 16:17:23 -07:00
Alex Bozarth 5d0f81da49
[SPARK-4411][WEB UI] Add "kill" link for jobs in the UI
## What changes were proposed in this pull request?

Currently users can kill stages via the web ui but not jobs directly (jobs are killed if one of their stages is). I've added the ability to kill jobs via the web ui. This code change is based on #4823 by lianhuiwang and updated to work with the latest code matching how stages are currently killed. In general I've copied the kill stage code warning and note comments and all. I also updated applicable tests and documentation.

## How was this patch tested?

Manually tested and dev/run-tests

![screen shot 2016-10-11 at 4 49 43 pm](https://cloud.githubusercontent.com/assets/13952758/19292857/12f1b7c0-8fd4-11e6-8982-210249f7b697.png)

Author: Alex Bozarth <ajbozart@us.ibm.com>
Author: Lianhui Wang <lianhuiwang09@gmail.com>

Closes #15441 from ajbozarth/spark4411.
2016-10-26 14:26:54 +02:00
Sandeep Singh bc167a2a53 [SPARK-928][CORE] Add support for Unsafe-based serializer in Kryo
## What changes were proposed in this pull request?
Now since we have migrated to Kryo-3.0.0 in https://issues.apache.org/jira/browse/SPARK-11416, we can gives users option to use unsafe SerDer. It can turned by setting `spark.kryo.useUnsafe` to `true`

## How was this patch tested?
Ran existing tests

```
     Benchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
      ------------------------------------------------------------------------------------------------
      basicTypes: Int unsafe:true                    160 /  178         98.5          10.1       1.0X
      basicTypes: Long unsafe:true                   210 /  218         74.9          13.4       0.8X
      basicTypes: Float unsafe:true                  203 /  213         77.5          12.9       0.8X
      basicTypes: Double unsafe:true                 226 /  235         69.5          14.4       0.7X
      Array: Int unsafe:true                        1087 / 1101         14.5          69.1       0.1X
      Array: Long unsafe:true                       2758 / 2844          5.7         175.4       0.1X
      Array: Float unsafe:true                      1511 / 1552         10.4          96.1       0.1X
      Array: Double unsafe:true                     2942 / 2972          5.3         187.0       0.1X
      Map of string->Double unsafe:true             2645 / 2739          5.9         168.2       0.1X
      basicTypes: Int unsafe:false                   211 /  218         74.7          13.4       0.8X
      basicTypes: Long unsafe:false                  247 /  253         63.6          15.7       0.6X
      basicTypes: Float unsafe:false                 211 /  216         74.5          13.4       0.8X
      basicTypes: Double unsafe:false                227 /  233         69.2          14.4       0.7X
      Array: Int unsafe:false                       3012 / 3032          5.2         191.5       0.1X
      Array: Long unsafe:false                      4463 / 4515          3.5         283.8       0.0X
      Array: Float unsafe:false                     2788 / 2868          5.6         177.2       0.1X
      Array: Double unsafe:false                    3558 / 3752          4.4         226.2       0.0X
      Map of string->Double unsafe:false            2806 / 2933          5.6         178.4       0.1X
```

Author: Sandeep Singh <sandeep@techaddict.me>
Author: Sandeep Singh <sandeep@origamilogic.com>

Closes #12913 from techaddict/SPARK-928.
2016-10-22 12:03:37 -07:00
Yu Peng 231f39e3f6 [SPARK-17711] Compress rolled executor log
## What changes were proposed in this pull request?

This PR adds support for executor log compression.

## How was this patch tested?

Unit tests

cc: yhuai tdas mengxr

Author: Yu Peng <loneknightpy@gmail.com>

Closes #15285 from loneknightpy/compress-executor-log.
2016-10-18 13:23:31 -07:00
Reynold Xin 72a6e7a57a Revert "[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across executors"
This reverts commit ed14633414.

The patch merged had obvious quality and documentation issue. The idea is useful, and we should work towards improving its quality and merging it in again.
2016-10-15 22:31:37 -07:00
Zhan Zhang ed14633414 [SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across executors
## What changes were proposed in this pull request?

Restructure the code and implement two new task assigner.
PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled.

BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors.

By default, the original round robin assigner is used.

We test a pipeline, and new PackedAssigner  save around 45% regarding the reserved cpu and memory with dynamic allocation enabled.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline.

Author: Zhan Zhang <zhanzhang@fb.com>

Closes #15218 from zhzhan/packed-scheduler.
2016-10-15 18:45:04 -07:00
Imran Rashid 9ce7d3e542 [SPARK-17675][CORE] Expand Blacklist for TaskSets
## What changes were proposed in this pull request?

This is a step along the way to SPARK-8425.

To enable incremental review, the first step proposed here is to expand the blacklisting within tasksets. In particular, this will enable blacklisting for
* (task, executor) pairs (this already exists via an undocumented config)
* (task, node)
* (taskset, executor)
* (taskset, node)

Adding (task, node) is critical to making spark fault-tolerant of one-bad disk in a cluster, without requiring careful tuning of "spark.task.maxFailures". The other additions are also important to avoid many misleading task failures and long scheduling delays when there is one bad node on a large cluster.

Note that some of the code changes here aren't really required for just this -- they put pieces in place for SPARK-8425 even though they are not used yet (eg. the `BlacklistTracker` helper is a little out of place, `TaskSetBlacklist` holds onto a little more info than it needs to for just this change, and `ExecutorFailuresInTaskSet` is more complex than it needs to be).

## How was this patch tested?

Added unit tests, run tests via jenkins.

Author: Imran Rashid <irashid@cloudera.com>
Author: mwws <wei.mao@intel.com>

Closes #15249 from squito/taskset_blacklist_only.
2016-10-12 16:43:03 -05:00
Marcelo Vanzin 2cd1bfa4f0 [SPARK-4563][CORE] Allow driver to advertise a different network address.
The goal of this feature is to allow the Spark driver to run in an
isolated environment, such as a docker container, and be able to use
the host's port forwarding mechanism to be able to accept connections
from the outside world.

The change is restricted to the driver: there is no support for achieving
the same thing on executors (or the YARN AM for that matter). Those still
need full access to the outside world so that, for example, connections
can be made to an executor's block manager.

The core of the change is simple: add a new configuration that tells what's
the address the driver should bind to, which can be different than the address
it advertises to executors (spark.driver.host). Everything else is plumbing
the new configuration where it's needed.

To use the feature, the host starting the container needs to set up the
driver's port range to fall into a range that is being forwarded; this
required the block manager port to need a special configuration just for
the driver, which falls back to the existing spark.blockManager.port when
not set. This way, users can modify the driver settings without affecting
the executors; it would theoretically be nice to also have different
retry counts for driver and executors, but given that docker (at least)
allows forwarding port ranges, we can probably live without that for now.

Because of the nature of the feature it's kinda hard to add unit tests;
I just added a simple one to make sure the configuration works.

This was tested with a docker image running spark-shell with the following
command:

 docker blah blah blah \
   -p 38000-38100:38000-38100 \
   [image] \
   spark-shell \
     --num-executors 3 \
     --conf spark.shuffle.service.enabled=false \
     --conf spark.dynamicAllocation.enabled=false \
     --conf spark.driver.host=[host's address] \
     --conf spark.driver.port=38000 \
     --conf spark.driver.blockManager.port=38020 \
     --conf spark.ui.port=38040

Running on YARN; verified the driver works, executors start up and listen
on ephemeral ports (instead of using the driver's config), and that caching
and shuffling (without the shuffle service) works. Clicked through the UI
to make sure all pages (including executor thread dumps) worked. Also tested
apps without docker, and ran unit tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #15120 from vanzin/SPARK-4563.
2016-09-21 14:42:41 -07:00
sandy bbe0b1d623
[SPARK-17575][DOCS] Remove extra table tags in configuration document
## What changes were proposed in this pull request?

Remove extra table tags in configurations document.

## How was this patch tested?

Run all test cases and generate document.

Before with extra tag its look like below
![config-wrong1](https://cloud.githubusercontent.com/assets/8075390/18608239/c602bb60-7d01-11e6-875e-f38558997dd3.png)

![config-wrong2](https://cloud.githubusercontent.com/assets/8075390/18608241/cf3b672c-7d01-11e6-935e-1e73f9e6e578.png)

After removing tags its looks like below

![config](https://cloud.githubusercontent.com/assets/8075390/18608245/e156eb8e-7d01-11e6-98aa-3be68d4d1961.png)

![config2](https://cloud.githubusercontent.com/assets/8075390/18608247/e84eecd4-7d01-11e6-9738-a3f7ff8fe834.png)

Author: sandy <phalodi@gmail.com>

Closes #15130 from phalodi/SPARK-17575.
2016-09-17 16:25:03 +01:00
Jagadeesan def7c265f5 [SPARK-17449][DOCUMENTATION] Relation between heartbeatInterval and…
## What changes were proposed in this pull request?

The relation between spark.network.timeout and spark.executor.heartbeatInterval should be mentioned in the document.

… network timeout]

Author: Jagadeesan <as2@us.ibm.com>

Closes #15042 from jagadeesanas2/SPARK-17449.
2016-09-14 09:03:16 +01:00
Gurvinder Singh 92ce8d4849 [SPARK-15487][WEB UI] Spark Master UI to reverse proxy Application and Workers UI
## What changes were proposed in this pull request?

This pull request adds the functionality to enable accessing worker and application UI through master UI itself. Thus helps in accessing SparkUI when running spark cluster in closed networks e.g. Kubernetes. Cluster admin needs to expose only spark master UI and rest of the UIs can be in the private network, master UI will reverse proxy the connection request to corresponding resource. It adds the path for workers/application UIs as

WorkerUI: <http/https>://master-publicIP:<port>/target/workerID/
ApplicationUI: <http/https>://master-publicIP:<port>/target/appID/

This makes it easy for users to easily protect the Spark master cluster access by putting some reverse proxy e.g. https://github.com/bitly/oauth2_proxy

## How was this patch tested?

The functionality has been tested manually and there is a unit test too for testing access to worker UI with reverse proxy address.

pwendell bomeng BryanCutler can you please review it, thanks.

Author: Gurvinder Singh <gurvinder.singh@uninett.no>

Closes #13950 from gurvindersingh/rproxy.
2016-09-08 17:20:20 -07:00
Jeff Zhang fa6347938f [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr shell command through --conf
## What changes were proposed in this pull request?

Allow user to set sparkr shell command through --conf spark.r.shell.command

## How was this patch tested?

Unit test is added and also verify it manually through
```
bin/sparkr --master yarn-client --conf spark.r.shell.command=/usr/local/bin/R
```

Author: Jeff Zhang <zjffdu@apache.org>

Closes #14744 from zjffdu/SPARK-17178.
2016-08-31 00:20:41 -07:00
Ferdinand Xu 4b4e329e49 [SPARK-5682][CORE] Add encrypted shuffle in spark
This patch is using Apache Commons Crypto library to enable shuffle encryption support.

Author: Ferdinand Xu <cheng.a.xu@intel.com>
Author: kellyzly <kellyzly@126.com>

Closes #8880 from winningsix/SPARK-10771.
2016-08-30 09:15:31 -07:00
Alex Bozarth 891ac2b914 [SPARK-15083][WEB UI] History Server can OOM due to unlimited TaskUIData
## What changes were proposed in this pull request?

Based on #12990 by tankkyo

Since the History Server currently loads all application's data it can OOM if too many applications have a significant task count. `spark.ui.trimTasks` (default: false) can be set to true to trim tasks by `spark.ui.retainedTasks` (default: 10000)

(This is a "quick fix" to help those running into the problem until a update of how the history server loads app data can be done)

## How was this patch tested?

Manual testing and dev/run-tests

![spark-15083](https://cloud.githubusercontent.com/assets/13952758/17713694/fe82d246-63b0-11e6-9697-b87ea75ff4ef.png)

Author: Alex Bozarth <ajbozart@us.ibm.com>

Closes #14673 from ajbozarth/spark15083.
2016-08-24 14:39:41 -05:00
wm624@hotmail.com e328f577e8 [SPARK-17002][CORE] Document that spark.ssl.protocol. is required for SSL
## What changes were proposed in this pull request?

`spark.ssl.enabled`=true, but failing to set `spark.ssl.protocol` will fail and throw meaningless exception. `spark.ssl.protocol` is required when `spark.ssl.enabled`.

Improvement: require `spark.ssl.protocol` when initializing SSLContext, otherwise throws an exception to indicate that.

Remove the OrElse("default").

Document this requirement in configure.md

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Manual tests:
Build document and check document

Configure `spark.ssl.enabled` only, it throws exception below:
6/08/16 16:04:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(mwang); groups with view permissions: Set(); users  with modify permissions: Set(mwang); groups with modify permissions: Set()
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: spark.ssl.protocol is required when enabling SSL connections.
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:285)
	at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1026)
	at org.apache.spark.deploy.master.Master$.main(Master.scala:1011)
	at org.apache.spark.deploy.master.Master.main(Master.scala)

Configure `spark.ssl.protocol`  and `spark.ssl.protocol`
It works fine.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #14674 from wangmiao1981/ssl.
2016-08-21 11:51:46 +01:00
WeichenXu 91f2735a18 [DOC] add config option spark.ui.enabled into document
## What changes were proposed in this pull request?

The configuration doc lost the config option `spark.ui.enabled` (default value is `true`)
I think this option is important because many cases we would like to turn it off.
so I add it.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14604 from WeichenXu123/add_doc_param_spark_ui_enabled.
2016-08-12 20:10:09 +01:00
Jeff Zhang 7a9e25c383 [SPARK-13081][PYSPARK][SPARK_SUBMIT] Allow set pythonExec of driver and executor through conf…
Before this PR, user have to export environment variable to specify the python of driver & executor which is not so convenient for users. This PR is trying to allow user to specify python through configuration "--pyspark-driver-python" & "--pyspark-executor-python"

Manually test in local & yarn mode for pyspark-shell and pyspark batch mode.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #13146 from zjffdu/SPARK-13081.
2016-08-11 20:08:39 -07:00
Andrew Ash 8a6b7037bb Correct example value for spark.ssl.YYY.XXX settings
Docs adjustment to:
- link to other relevant section of docs
- correct statement about the only value when actually other values are supported

Author: Andrew Ash <andrew@andrewash.com>

Closes #14581 from ash211/patch-10.
2016-08-11 11:26:57 +01:00
Michael Gummelt 53d1c78779 Update docs to include SASL support for RPC
## What changes were proposed in this pull request?

Update docs to include SASL support for RPC

Evidence: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/netty/NettyRpcEnv.scala#L63

## How was this patch tested?

Docs change only

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #14549 from mgummelt/sasl.
2016-08-08 16:07:51 -07:00
Sital Kedia 9c15d079df [SPARK-15074][SHUFFLE] Cache shuffle index file to speedup shuffle fetch
## What changes were proposed in this pull request?

Shuffle fetch on large intermediate dataset is slow because the shuffle service open/close the index file for each shuffle fetch. This change introduces a cache for the index information so that we can avoid accessing the index files for each block fetch

## How was this patch tested?

Tested by running a job on the cluster and the shuffle read time was reduced by 50%.

Author: Sital Kedia <skedia@fb.com>

Closes #12944 from sitalkedia/shuffle_service.
2016-08-04 14:54:38 -07:00
Nicholas Brown ba0aade6d5 Fix description of spark.speculation.quantile
## What changes were proposed in this pull request?

Minor doc fix regarding the spark.speculation.quantile configuration parameter.  It incorrectly states it should be a percentage, when it should be a fraction.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

I tried building the documentation but got some unidoc errors.  I also got them when building off origin/master, so I don't think I caused that problem.  I did run the web app and saw the changes reflected as expected.

Author: Nicholas Brown <nbrown@adroitdigital.com>

Closes #14352 from nwbvt/master.
2016-07-25 19:18:27 -07:00
Tom Graves 6c56fff118 [SPARK-16650] Improve documentation of spark.task.maxFailures
Clarify documentation on spark.task.maxFailures

No tests run as its documentation

Author: Tom Graves <tgraves@yahoo-inc.com>

Closes #14287 from tgravescs/SPARK-16650.
2016-07-22 12:41:38 +01:00
WeichenXu b1310425b3 [DOC][SQL] update out-of-date code snippets using SQLContext in all documents.
## What changes were proposed in this pull request?

I search the whole documents directory using SQLContext, and update the following places:

- docs/configuration.md, sparkR code snippets.
- docs/streaming-programming-guide.md, several example code.

## How was this patch tested?

N/A

Author: WeichenXu <WeichenXu123@outlook.com>

Closes #14025 from WeichenXu123/WIP_SQLContext_update.
2016-07-06 10:41:48 -07:00
Ryan Blue 738f134bf4 [SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.
## What changes were proposed in this pull request?

This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors.

This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message.

## How was this patch tested?

This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors.

Author: Ryan Blue <blue@apache.org>

Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
2016-06-23 14:03:46 -05:00
Sean Owen 457126e420 [SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid overrunning old gen in JVM default config
## What changes were proposed in this pull request?

Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13618 from srowen/SPARK-15796.
2016-06-16 23:04:10 +02:00
Marcelo Vanzin 200f01c8fb [SPARK-15760][DOCS] Add documentation for package-related configs.
While there, also document spark.files and spark.jars. Text is the
same as the spark-submit help text with some minor adjustments.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #13502 from vanzin/SPARK-15760.
2016-06-07 09:28:39 -07:00
gatorsmile d207716451 [SPARK-15485][SQL][DOCS] Spark SQL Configuration
#### What changes were proposed in this pull request?
So far, the page Configuration in the official documentation does not have a section for Spark SQL.
http://spark.apache.org/docs/latest/configuration.html

For Spark users, the information and default values of these public configuration parameters are very useful. This PR is to add this missing section to the configuration.html.

rxin yhuai marmbrus

#### How was this patch tested?
Below is the generated webpage.
<img width="924" alt="screenshot 2016-05-23 11 35 57" src="https://cloud.githubusercontent.com/assets/11567269/15480492/b08fefc4-20da-11e6-9fa2-7cd5b699ed35.png">
<img width="914" alt="screenshot 2016-05-23 11 37 38" src="https://cloud.githubusercontent.com/assets/11567269/15480499/c5f9482e-20da-11e6-95ff-10821add1af4.png">
<img width="923" alt="screenshot 2016-05-23 11 36 11" src="https://cloud.githubusercontent.com/assets/11567269/15480506/cbd81644-20da-11e6-9d27-effb716b2fac.png">
<img width="920" alt="screenshot 2016-05-23 11 36 18" src="https://cloud.githubusercontent.com/assets/11567269/15480511/d013e332-20da-11e6-854a-cf8813c46f36.png">

Author: gatorsmile <gatorsmile@gmail.com>

Closes #13263 from gatorsmile/configurationSQL.
2016-05-23 21:07:14 -07:00
Philipp Hoffmann 65b4ab281e [SPARK-15223][DOCS] fix wrongly named config reference
## What changes were proposed in this pull request?

The configuration setting `spark.executor.logs.rolling.size.maxBytes` was changed to `spark.executor.logs.rolling.maxSize` in 1.4 or so.

This commit fixes a remaining reference to the old name in the documentation.

Also the description for `spark.executor.logs.rolling.maxSize` was edited to clearly state that the unit for the size is bytes.

## How was this patch tested?

no tests

Author: Philipp Hoffmann <mail@philipphoffmann.de>

Closes #13001 from philipphoffmann/patch-3.
2016-05-09 11:02:13 -07:00
Dhruve Ashar a45647746d [SPARK-4224][CORE][YARN] Support group acls
## What changes were proposed in this pull request?
Currently only a list of users can be specified for view and modify acls. This change enables a group of admins/devs/users to be provisioned for viewing and modifying Spark jobs.

**Changes Proposed in the fix**
Three new corresponding config entries have been added where the user can specify the groups to be given access.

```
spark.admin.acls.groups
spark.modify.acls.groups
spark.ui.view.acls.groups
```

New config entries were added because specifying the users and groups explicitly is a better and cleaner way compared to specifying them in the existing config entry using a delimiter.

A generic trait has been introduced to provide the user to group mapping which makes it pluggable to support a variety of mapping protocols - similar to the one used in hadoop. A default unix shell based implementation has been provided.
Custom user to group mapping protocol can be specified and configured by the entry ```spark.user.groups.mapping```

**How the patch was Tested**
We ran different spark jobs setting the config entries in combinations of admin, modify and ui acls. For modify acls we tried killing the job stages from the ui and using yarn commands. For view acls we tried accessing the UI tabs and the logs. Headless accounts were used to launch these jobs and different users tried to modify and view the jobs to ensure that the groups mapping applied correctly.

Additional Unit tests have been added without modifying the existing ones. These test for different ways of setting the acls through configuration and/or API and validate the expected behavior.

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #12760 from dhruve/impr/SPARK-4224.
2016-05-04 08:45:43 -05:00
Reynold Xin 5e92583d38 [SPARK-14667] Remove HashShuffleManager
## What changes were proposed in this pull request?
The sort shuffle manager has been the default since Spark 1.2. It is time to remove the old hash shuffle manager.

## How was this patch tested?
Removed some tests related to the old manager.

Author: Reynold Xin <rxin@databricks.com>

Closes #12423 from rxin/SPARK-14667.
2016-04-18 19:30:00 -07:00
Dhruve Ashar f83ba454a5 [SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions
## What changes were proposed in this pull request?
The configuration docs are updated to reflect the changes introduced with [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This allows the user to specify initial heap memory settings through the extraJavaOptions for executor, driver and am.

## How was this patch tested?
The changes are tested in [SPARK-12384](https://issues.apache.org/jira/browse/SPARK-12384). This is just documenting the changes made.

Author: Dhruve Ashar <dhruveashar@gmail.com>

Closes #12333 from dhruve/doc/SPARK-14572.
2016-04-14 10:29:14 -05:00