Commit graph

24 commits

Author SHA1 Message Date
NICHOLAS T. MARION b512233a45 [SPARK-20393][WEBU UI] Strengthen Spark to prevent XSS vulnerabilities
## What changes were proposed in this pull request?

Add stripXSS and stripXSSMap to Spark Core's UIUtils. Calling these functions at any point that getParameter is called against a HttpServletRequest.

## How was this patch tested?

Unit tests, IBM Security AppScan Standard no longer showing vulnerabilities, manual verification of WebUI pages.

Author: NICHOLAS T. MARION <nmarion@us.ibm.com>

Closes #17686 from n-marion/xss-fix.
2017-05-10 10:59:57 +01:00
jerryshao 829cd7b8b7 [SPARK-20605][CORE][YARN][MESOS] Deprecate not used AM and executor port configuration
## What changes were proposed in this pull request?

After SPARK-10997, client mode Netty RpcEnv doesn't require to start server, so port configurations are not used any more, here propose to remove these two configurations: "spark.executor.port" and "spark.am.port".

## How was this patch tested?

Existing UTs.

Author: jerryshao <sshao@hortonworks.com>

Closes #17866 from jerryshao/SPARK-20605.
2017-05-08 14:27:56 -07:00
liuxian 0f820e2b6c [SPARK-20519][SQL][CORE] Modify to prevent some possible runtime exceptions
Signed-off-by: liuxian <liu.xian3zte.com.cn>

## What changes were proposed in this pull request?

When the input parameter is null, may be a runtime exception occurs

## How was this patch tested?
Existing unit tests

Author: liuxian <liu.xian3@zte.com.cn>

Closes #17796 from 10110346/wip_lx_0428.
2017-05-08 10:00:58 +01:00
Davis Shepherd 039e32ca19
[SPARK-20483][MINOR] Test for Mesos Coarse mode may starve other Mesos frameworks
## What changes were proposed in this pull request?

Add test case for scenarios where executor.cores is set as a
(non)divisor of spark.cores.max
This tests the change in
#17786

## How was this patch tested?

Ran the existing test suite with the new tests

dbtsai

Author: Davis Shepherd <dshepherd@netflix.com>

Closes #17788 from dgshep/add_mesos_test.
2017-04-27 20:25:52 +00:00
Davis Shepherd 7633933e54
[SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks
## What changes were proposed in this pull request?

Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores

## How was this patch tested?

This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.

dbtsai mgummelt

Author: Davis Shepherd <dshepherd@netflix.com>

Closes #17786 from dgshep/fix_mesos_max_cores.
2017-04-27 18:06:12 +00:00
Josh Rosen f44c8a843c [SPARK-20453] Bump master branch version to 2.3.0-SNAPSHOT
This patch bumps the master branch version to `2.3.0-SNAPSHOT`.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #17753 from JoshRosen/SPARK-20453.
2017-04-24 21:48:04 -07:00
郭小龙 10207633 2eaf4f3fe3 [SPARK-20385][WEB-UI] Submitted Time' field, the date format needs to be formatted, in running Drivers table or Completed Drivers table in master web ui.
## What changes were proposed in this pull request?
Submitted Time' field, the date format **needs to be formatted**, in running Drivers table or Completed Drivers table in master web ui.
Before fix this problem  e.g.

Completed Drivers
Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
driver-20170419145755-0005	 **Wed Apr 19 14:57:55 CST 2017**	 worker-20170419145250-zdh120-40412	FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest

please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863977/before_fix.png

After fix this problem e.g.

Completed Drivers
Submission ID	             **Submitted Time**  	             Worker	                            State	   Cores	   Memory	       Main Class
driver-20170419145755-0006	 **2017/04/19 16:01:25**	 worker-20170419145250-zdh120-40412	         FAILED	   1	       1024.0 MB	   cn.zte.HdfsTest

please see the  attachment:https://issues.apache.org/jira/secure/attachment/12863976/after_fix.png

'Submitted Time' field, the date format **has been formatted**, in running Applications table or Completed Applicationstable in master web ui, **it is correct.**
e.g.
Running Applications
Application ID	                Name	                Cores	Memory per Executor	   **Submitted Time**	      User	   State	        Duration
app-20170419160910-0000 (kill)	SparkSQL::10.43.183.120	1	    5.0 GB	               **2017/04/19 16:09:10**	  root	   RUNNING	    53 s

**Format after the time easier to observe, and consistent with the applications table,so I think it's worth fixing.**

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: 郭小龙 10207633 <guo.xiaolong1@zte.com.cn>
Author: guoxiaolong <guo.xiaolong1@zte.com.cn>
Author: guoxiaolongzte <guo.xiaolong1@zte.com.cn>

Closes #17682 from guoxiaolongzte/SPARK-20385.
2017-04-23 13:33:14 +01:00
Ji Yan a888fed309 [SPARK-19740][MESOS] Add support in Spark to pass arbitrary parameters into docker when running on mesos with docker containerizer
## What changes were proposed in this pull request?

Allow passing in arbitrary parameters into docker when launching spark executors on mesos with docker containerizer tnachen

## How was this patch tested?

Manually built and tested with passed in parameter

Author: Ji Yan <jiyan@Jis-MacBook-Air.local>

Closes #17109 from yanji84/ji/allow_set_docker_user.
2017-04-16 14:34:12 +01:00
hyukjinkwon ceaf77ae43 [SPARK-18692][BUILD][DOCS] Test Java 8 unidoc build on Jenkins
## What changes were proposed in this pull request?

This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.

There are several problems with it:

- It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".

- > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.

  (see  joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))

To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.

There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013

Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.

## How was this patch tested?

Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.

This was tested via manually adding `time.time()` as below:

```diff
     profiles_and_goals = build_profiles + sbt_goals

     print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
           " ".join(profiles_and_goals))

+    import time
+    st = time.time()
     exec_sbt(profiles_and_goals)
+    print("Elapsed :[%s]" % str(time.time() - st))
```

produces

```
...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17477 from HyukjinKwon/SPARK-18692.
2017-04-12 12:38:48 +01:00
Kalvin Chau c8fc1f3bad [SPARK-20085][MESOS] Configurable mesos labels for executors
## What changes were proposed in this pull request?

Add spark.mesos.task.labels configuration option to add mesos key:value labels to the executor.

 "k1:v1,k2:v2" as the format, colons separating key-value and commas to list out more than one.

Discussion of labels with mgummelt at #17404

## How was this patch tested?

Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.

Tested with: `./build/sbt -Pmesos mesos/test`

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Kalvin Chau <kalvin.chau@viasat.com>

Closes #17413 from kalvinnchau/mesos-labels.
2017-04-06 09:14:31 +01:00
Kalvin Chau e8ddb91c7e [SPARK-20078][MESOS] Mesos executor configurability for task name and labels
## What changes were proposed in this pull request?

Adding configurable mesos executor names and labels using `spark.mesos.task.name` and `spark.mesos.task.labels`.

Labels were defined as `k1:v1,k2:v2`.

mgummelt

## How was this patch tested?

Added unit tests to verify labels were added correctly, with incorrect labels being ignored and added a test to test the name of the executor.

Tested with: `./build/sbt -Pmesos mesos/test`

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Kalvin Chau <kalvin.chau@viasat.com>

Closes #17404 from kalvinnchau/mesos-config.
2017-03-25 10:42:15 +00:00
Eric Liang 8e558041aa [SPARK-19820][CORE] Add interface to kill tasks w/ a reason
This commit adds a killTaskAttempt method to SparkContext, to allow users to
kill tasks so that they can be re-scheduled elsewhere.

This also refactors the task kill path to allow specifying a reason for the task kill. The reason is propagated opaquely through events, and will show up in the UI automatically as `(N killed: $reason)` and `TaskKilled: $reason`. Without this change, there is no way to provide the user feedback through the UI.

Currently used reasons are "stage cancelled", "another attempt succeeded", and "killed via SparkContext.killTask". The user can also specify a custom reason through `SparkContext.killTask`.

cc rxin

In the stage overview UI the reasons are summarized:
![1](https://cloud.githubusercontent.com/assets/14922/23929209/a83b2862-08e1-11e7-8b3e-ae1967bbe2e5.png)

Within the stage UI you can see individual task kill reasons:
![2](https://cloud.githubusercontent.com/assets/14922/23929200/9a798692-08e1-11e7-8697-72b27ad8a287.png)

Existing tests, tried killing some stages in the UI and verified the messages are as expected.

Author: Eric Liang <ekl@databricks.com>
Author: Eric Liang <ekl@google.com>

Closes #17166 from ericl/kill-reason.
2017-03-23 23:30:44 -07:00
Ye Yin b0ae6a38a3 Typo fixup in comment
## What changes were proposed in this pull request?

Fixup typo in comment.

## How was this patch tested?

Don't need.

Author: Ye Yin <eyniy@qq.com>

Closes #17396 from hustcat/fix.
2017-03-23 13:30:50 +01:00
Yong Tang 8f0490e22b [SPARK-17979][SPARK-14453] Remove deprecated SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS
This fix removes deprecated support for config `SPARK_YARN_USER_ENV`, as is mentioned in SPARK-17979.
This fix also removes deprecated support for the following:
```
SPARK_YARN_USER_ENV
SPARK_JAVA_OPTS
SPARK_CLASSPATH
SPARK_WORKER_INSTANCES
```

Related JIRA:
[SPARK-14453]: https://issues.apache.org/jira/browse/SPARK-14453
[SPARK-12344]: https://issues.apache.org/jira/browse/SPARK-12344
[SPARK-15781]: https://issues.apache.org/jira/browse/SPARK-15781

Existing tests should pass.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #17212 from yongtang/SPARK-17979.
2017-03-10 13:34:01 -08:00
Michael Gummelt 2e30c0b9bc [SPARK-19702][MESOS] Increase default refuse_seconds timeout in the Mesos Spark Dispatcher
## What changes were proposed in this pull request?

Increase default refuse_seconds timeout, and make it configurable.  See JIRA for details on how this reduces the risk of starvation.

## How was this patch tested?

Unit tests, Manual testing, and Mesos/Spark integration test suite

cc susanxhuynh skonto jmlvanre

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #17031 from mgummelt/SPARK-19702-suppress-revive.
2017-03-07 21:29:08 +00:00
Michael Gummelt ca3864d6e0 [SPARK-19373][MESOS] Base spark.scheduler.minRegisteredResourceRatio on registered cores rather than accepted cores
## What changes were proposed in this pull request?

See JIRA

## How was this patch tested?

Unit tests, Mesos/Spark integration tests

cc skonto susanxhuynh

Author: Michael Gummelt <mgummelt@mesosphere.io>

Closes #17045 from mgummelt/SPARK-19373-registered-resources.
2017-03-01 00:10:55 +01:00
Devaraj K 410392ed75 [SPARK-15288][MESOS] Mesos dispatcher should handle gracefully when any thread gets UncaughtException
## What changes were proposed in this pull request?

Adding the default UncaughtExceptionHandler to the MesosClusterDispatcher.
## How was this patch tested?

I verified it manually, when any of the dispatcher thread gets uncaught exceptions then the default UncaughtExceptionHandler will handle those exceptions.

Author: Devaraj K <devaraj@apache.org>

Closes #13072 from devaraj-kavali/SPARK-15288.
2017-02-25 21:48:41 +00:00
jinxing ba8912e5f3
[SPARK-19450] Replace askWithRetry with askSync.
## What changes were proposed in this pull request?

`askSync` is already added in `RpcEndpointRef` (see SPARK-19347 and https://github.com/apache/spark/pull/16690#issuecomment-276850068) and `askWithRetry` is marked as deprecated.
As mentioned SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):

>askWithRetry is basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it.

Since `askWithRetry` is just used inside spark and not in user logic. It might make sense to replace all of them with `askSync`.

## How was this patch tested?
This PR doesn't change code logic, existing unit test can cover.

Author: jinxing <jinxing@meituan.com>

Closes #16790 from jinxing64/SPARK-19450.
2017-02-19 04:34:07 -08:00
Devaraj K 8640dc0823
[SPARK-10748][MESOS] Log error instead of crashing Spark Mesos dispatcher when a job is misconfigured
## What changes were proposed in this pull request?

Now handling the spark exception which gets thrown for invalid job configuration, marking that job as failed and continuing to launch the other drivers instead of throwing the exception.
## How was this patch tested?

I verified manually, now the misconfigured jobs move to Finished Drivers section in UI and continue to launch the other jobs.

Author: Devaraj K <devaraj@apache.org>

Closes #13077 from devaraj-kavali/SPARK-10748.
2017-02-10 14:11:56 +00:00
Dongjoon Hyun 0077bfcb93
[SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatibilitySuite failure due to test dependency on avro
## What changes were proposed in this pull request?

After using Apache Parquet 1.8.2, `ParquetAvroCompatibilitySuite` fails on **Maven** test. It is because `org.apache.parquet.avro.AvroParquetWriter` in the test code used new `avro 1.8.0` specific class, `LogicalType`. This PR aims to fix the test dependency of `sql/core` module to use avro 1.8.0.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/2530/consoleFull

```
ParquetAvroCompatibilitySuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/avro/LogicalType
  at org.apache.parquet.avro.AvroParquetWriter.writeSupport(AvroParquetWriter.java:144)
```

## How was this patch tested?

Pass the existing test with **Maven**.

```
$ build/mvn -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver test
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:07 h
[INFO] Finished at: 2017-02-04T05:41:43+00:00
[INFO] Final Memory: 77M/987M
[INFO] ------------------------------------------------------------------------
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16795 from dongjoon-hyun/SPARK-19409-2.
2017-02-08 12:21:49 +00:00
Marcelo Vanzin 8f3f73abc1 [SPARK-19139][CORE] New auth mechanism for transport library.
This change introduces a new auth mechanism to the transport library,
to be used when users enable strong encryption. This auth mechanism
has better security than the currently used DIGEST-MD5.

The new protocol uses symmetric key encryption to mutually authenticate
the endpoints, and is very loosely based on ISO/IEC 9798.

The new protocol falls back to SASL when it thinks the remote end is old.
Because SASL does not support asking the server for multiple auth protocols,
which would mean we could re-use the existing SASL code by just adding a
new SASL provider, the protocol is implemented outside of the SASL API
to avoid the boilerplate of adding a new provider.

Details of the auth protocol are discussed in the included README.md
file.

This change partly undos the changes added in SPARK-13331; AES encryption
is now decoupled from SASL authentication. The encryption code itself,
though, has been re-used as part of this change.

## How was this patch tested?

- Unit tests
- Tested Spark 2.2 against Spark 1.6 shuffle service with SASL enabled
- Tested Spark 2.2 against Spark 2.2 shuffle service with SASL fallback disabled

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #16521 from vanzin/SPARK-19139.
2017-01-24 10:44:04 -08:00
Kay Ousterhout 2e139eed31 [SPARK-17931] Eliminate unnecessary task (de) serialization
In the existing code, there are three layers of serialization
    involved in sending a task from the scheduler to an executor:
        - A Task object is serialized
        - The Task object is copied to a byte buffer that also
          contains serialized information about any additional JARs,
          files, and Properties needed for the task to execute. This
          byte buffer is stored as the member variable serializedTask
          in the TaskDescription class.
        - The TaskDescription is serialized (in addition to the serialized
          task + JARs, the TaskDescription class contains the task ID and
          other metadata) and sent in a LaunchTask message.

While it *is* necessary to have two layers of serialization, so that
the JAR, file, and Property info can be deserialized prior to
deserializing the Task object, the third layer of deserialization is
unnecessary.  This commit eliminates a layer of serialization by moving
the JARs, files, and Properties into the TaskDescription class.

This commit also serializes the Properties manually (by traversing the map),
as is done with the JARs and files, which reduces the final serialized size.

Unit tests

This is a simpler alternative to the approach proposed in #15505.

shivaram and I did some benchmarking of this and #15505 on a 20-machine m2.4xlarge EC2 machines (160 cores). We ran ~30 trials of code [1] (a very simple job with 10K tasks per stage) and measured the average time per stage:

Before this change: 2490ms
With this change: 2345 ms (so ~6% improvement over the baseline)
With witgo's approach in #15505: 2046 ms (~18% improvement over baseline)

The reason that #15505 has a more significant improvement is that it also moves the serialization from the TaskSchedulerImpl thread to the CoarseGrainedSchedulerBackend thread. I added that functionality on top of this change, and got almost the same improvement [1] as #15505 (average of 2103ms). I think we should decouple these two changes, both so we have some record of the improvement form each individual improvement, and because this change is more about simplifying the code base (the improvement is negligible) while the other is about performance improvement.  The plan, currently, is to merge this PR and then merge the remaining part of #15505 that moves serialization.

[1] The reason the improvement wasn't quite as good as with #15505 when we ran the benchmarks is almost certainly because, at the point when we ran the benchmarks, I hadn't updated the code to manually serialize the Properties (instead the code was using Java's default serialization for the Properties object, whereas #15505 manually serialized the Properties).  This PR has since been updated to manually serialize the Properties, just like the other maps.

Author: Kay Ousterhout <kayousterhout@gmail.com>

Closes #16053 from kayousterhout/SPARK-17931.
2017-01-06 10:48:08 -06:00
Devaraj K 89bf370e4f [SPARK-15555][MESOS] Driver with --supervise option cannot be killed in Mesos mode
## What changes were proposed in this pull request?

Not adding the Killed applications for retry.
## How was this patch tested?

I have verified manually in the Mesos cluster, with the changes the killed applications move to Finished Drivers section and will not retry.

Author: Devaraj K <devaraj@apache.org>

Closes #13323 from devaraj-kavali/SPARK-15555.
2017-01-03 11:02:42 -08:00
Anirudh 81e5619ca1 [SPARK-18662] Move resource managers to separate directory
## What changes were proposed in this pull request?

* Moves yarn and mesos scheduler backends to resource-managers/ sub-directory (in preparation for https://issues.apache.org/jira/browse/SPARK-18278)
* Corresponding change in top-level pom.xml.

Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340

## How was this patch tested?

* Manual tests

/cc rxin

Author: Anirudh <ramanathana@google.com>

Closes #16092 from foxish/fix-scheduler-structure-2.
2016-12-06 16:23:27 -08:00