## What changes were proposed in this pull request?
This change resolves a number of build warnings that have accumulated, before 2.x. It does not address a large number of deprecation warnings, especially related to the Accumulator API. That will happen separately.
## How was this patch tested?
Jenkins
Author: Sean Owen <sowen@cloudera.com>
Closes#13377 from srowen/BuildWarnings.
## What changes were proposed in this pull request?
This patch reduces the verbosity of aggregate expressions in explain (but does not actually remove any information). As an example, for the following command:
```
spark.range(10).selectExpr("sum(id) + 1", "count(distinct id)").explain(true)
```
Output before this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], functions=[(sum(id#0L),mode=Final,isDistinct=false),(count(id#0L),mode=Final,isDistinct=true)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
+- *TungstenAggregate(key=[], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false),(count(id#0L),mode=Partial,isDistinct=true)], output=[sum#18L,count#21L])
+- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=PartialMerge,isDistinct=false)], output=[id#0L,sum#18L])
+- Exchange hashpartitioning(id#0L, 5), None
+- *TungstenAggregate(key=[id#0L], functions=[(sum(id#0L),mode=Partial,isDistinct=false)], output=[id#0L,sum#18L])
+- *Range (0, 10, splits=2)
```
Output after this patch:
```
== Physical Plan ==
*TungstenAggregate(key=[], functions=[sum(id#0L),count(distinct id#0L)], output=[(sum(id) + 1)#3L,count(DISTINCT id)#16L])
+- Exchange SinglePartition, None
+- *TungstenAggregate(key=[], functions=[merge_sum(id#0L),partial_count(distinct id#0L)], output=[sum#18L,count#21L])
+- *TungstenAggregate(key=[id#0L], functions=[merge_sum(id#0L)], output=[id#0L,sum#18L])
+- Exchange hashpartitioning(id#0L, 5), None
+- *TungstenAggregate(key=[id#0L], functions=[partial_sum(id#0L)], output=[id#0L,sum#18L])
+- *Range (0, 10, splits=2)
```
Note the change from `(sum(id#0L),mode=PartialMerge,isDistinct=false)` to `merge_sum(id#0L)`.
In general aggregate explain is still very verbose, but further work will be done as follow-up pull requests.
## How was this patch tested?
Tested manually.
Author: Reynold Xin <rxin@databricks.com>
Closes#13367 from rxin/SPARK-15636.
## What changes were proposed in this pull request?
Change version check in R tests
## How was this patch tested?
R tests
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#13369 from felixcheung/rversioncheck.
## What changes were proposed in this pull request?
I create a bucketed table bucketed_table with bucket column i,
```scala
case class Data(i: Int, j: Int, k: Int)
sc.makeRDD(Array((1, 2, 3))).map(x => Data(x._1, x._2, x._3)).toDF.write.bucketBy(2, "i").saveAsTable("bucketed_table")
```
and I run the following SQLs:
```sql
SELECT j FROM bucketed_table;
Error in query: bucket column i not found in existing columns (j);
SELECT j, MAX(k) FROM bucketed_table GROUP BY j;
Error in query: bucket column i not found in existing columns (j, k);
```
I think we should add a check that, we only enable bucketing when it satisfies all conditions below:
1. the conf is enabled
2. the relation is bucketed
3. the output contains all bucketing columns
## How was this patch tested?
Updated test cases to reflect the changes.
Author: Yadong Qi <qiyadong2010@gmail.com>
Closes#13321 from watermen/SPARK-15549.
## What changes were proposed in this pull request?
Let `Dataset.createTempView` and `Dataset.createOrReplaceTempView` use `CreateViewCommand`, rather than calling `SparkSession.createTempView`. Besides, this patch also removes `SparkSession.createTempView`.
## How was this patch tested?
Existing tests.
Author: Liang-Chi Hsieh <simonh@tw.ibm.com>
Closes#13327 from viirya/dataset-createtempview.
## What changes were proposed in this pull request?
This is a simple patch that makes package names for Java 8 test suites consistent. I moved everything to test.org.apache.spark to we can test package private APIs properly. Also added "java8" as the package name so we can easily run all the tests related to Java 8.
## How was this patch tested?
This is a test only change.
Author: Reynold Xin <rxin@databricks.com>
Closes#13364 from rxin/SPARK-15633.
## What changes were proposed in this pull request?
Fix the wrong bound of `k` in `PCA`
`require(k <= sources.first().size, ...` -> `require(k < sources.first().size`
BTW, remove unused import in `ml.ElementwiseProduct`
## How was this patch tested?
manual tests
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#13356 from zhengruifeng/fix_pca.
## What changes were proposed in this pull request?
Temp directory used to save records is not deleted after program exit in DataFrameExample. Although it called deleteOnExit, it doesn't work as the directory is not empty. Similar things happend in ContextCleanerSuite. Update the code to make sure temp directory is deleted after program exit.
## How was this patch tested?
unit tests and local build.
Author: dding3 <ding.ding@intel.com>
Closes#13328 from dding3/master.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
In the MLLib naivebayes example, scala and python example doesn't use libsvm data, but Java does.
I make changes in scala and python example to use the libsvm data as the same as Java example.
## How was this patch tested?
Manual tests
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#13301 from wangmiao1981/example.
## What changes were proposed in this pull request?
These commands ignore the partition spec and change the storage properties of the table itself:
```
ALTER TABLE table_name PARTITION (a=1, b=2) SET SERDE 'my_serde'
ALTER TABLE table_name PARTITION (a=1, b=2) SET SERDEPROPERTIES ('key1'='val1')
```
Now they change the storage properties of the specified partition.
## How was this patch tested?
DDLSuite
Author: Andrew Or <andrew@databricks.com>
Closes#13343 from andrewor14/alter-table-serdeproperties.
## What changes were proposed in this pull request?
This includes minimal changes to get Spark using the current release of Parquet, 1.8.1.
## How was this patch tested?
This uses the existing Parquet tests.
Author: Ryan Blue <blue@apache.org>
Closes#13280 from rdblue/SPARK-9876-update-parquet.
## What changes were proposed in this pull request?
This PR reworks on the CliSuite test cases for `LIST FILES/JARS` commands.
CC yhuai Thanks!
Author: Xin Wu <xinwu@us.ibm.com>
Closes#13361 from xwu0226/SPARK-15431-clisuite-new.
## What changes were proposed in this pull request?
We're using `asML` to convert the mllib vector/matrix to ml vector/matrix now. Using `as` is more correct given that this conversion actually shares the same underline data structure. As a result, in this PR, `toBreeze` will be changed to `asBreeze`. This is a private API, as a result, it will not affect any user's application.
## How was this patch tested?
unit tests
Author: DB Tsai <dbt@netflix.com>
Closes#13198 from dbtsai/minor.
## What changes were proposed in this pull request?
1. Add `_transfer_param_map_to/from_java` for OneVsRest;
2. Add `_compare_params` in ml/tests.py to help compare params.
3. Add `test_onevsrest` as the integration test for OneVsRest.
## How was this patch tested?
Python unit test.
Author: yinxusen <yinxusen@gmail.com>
Closes#12875 from yinxusen/SPARK-15008.
## What changes were proposed in this pull request?
* Document ```WeightedLeastSquares```(normal equation) and ```IterativelyReweightedLeastSquares```.
* Copy ```L-BFGS``` documents from ```spark.mllib``` to ```spark.ml```.
Due to the session ```Optimization of linear methods``` is used for developers, I think we should provide the brief introduction of the optimization method, necessary references and how it implements in Spark. It's not necessary to paste all mathematical formula and derivation here. If developers/users want to learn more, they can track reference.
## How was this patch tested?
Document update, no tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#13262 from yanboliang/spark-15484.
## What changes were proposed in this pull request?
This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754).
## How was this patch tested?
Documentation only, no tests required.
## Approach
In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#13139 from sethah/SPARK-15186.
## What changes were proposed in this pull request?
- Refer to the Jira for the problem: jira : https://issues.apache.org/jira/browse/SPARK-14400
- The fix is to check if the process has exited with a non-zero exit code in `hasNext()`. I have moved this and checking of writer thread exception to a separate method.
## How was this patch tested?
- Ran a job which had incorrect transform script command and saw that the job fails
- Existing unit tests for `ScriptTransformationSuite`. Added a new unit test
Author: Tejas Patil <tejasp@fb.com>
Closes#12194 from tejasapatil/script_transform.
## What changes were proposed in this pull request?
Remove several obsolete env variables not supported for Spark on YARN now, also updates the docs to include several changes with 2.0.
## How was this patch tested?
N/A
CC vanzin tgravescs
Author: jerryshao <sshao@hortonworks.com>
Closes#13296 from jerryshao/yarn-doc.
## What changes were proposed in this pull request?
Explicitly limit launcher JVM memory to modest 128m
## How was this patch tested?
Jenkins tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#13360 from srowen/SPARK-15531.
## What changes were proposed in this pull request?
Profiling a Spark job spilling large amount of intermediate data we found that significant portion of time is being spent in DiskObjectWriter.updateBytesWritten function. Looking at the code, we see that the function is being called too frequently to update the number of bytes written to disk. We should reduce the frequency to avoid this.
## How was this patch tested?
Tested by running the job on cluster and saw 20% CPU gain by this change.
Author: Sital Kedia <skedia@fb.com>
Closes#13332 from sitalkedia/DiskObjectWriter.
## What changes were proposed in this pull request?
Minor typo fixes in Dataset scaladoc
* Corrected context type as SparkSession, not SQLContext.
liancheng rxin andrewor14
## How was this patch tested?
Compiled locally
Author: Xinh Huynh <xinh_huynh@yahoo.com>
Closes#13330 from xinhhuynh/fix-dataset-typos.
## What changes were proposed in this pull request?
This patch adds a new function emptyDataset to SparkSession, for creating an empty dataset.
## How was this patch tested?
Added a test case.
Author: Reynold Xin <rxin@databricks.com>
Closes#13344 from rxin/SPARK-15597.
## What changes were proposed in this pull request?
Adds API docs and usage examples for the 3 `createDataset` calls in `SparkSession`
## How was this patch tested?
N/A
Author: Sameer Agarwal <sameer@databricks.com>
Closes#13345 from sameeragarwal/dataset-doc.
## What changes were proposed in this pull request?
This PR replaces `spark.sql.sources.` strings with `CreateDataSourceTableUtils.*` constant variables.
## How was this patch tested?
Pass the existing Jenkins tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#13349 from dongjoon-hyun/SPARK-15584.
## What changes were proposed in this pull request?
This PR replaces all deprecated `SQLContext` occurrences with `SparkSession` in `ML/MLLib` module except the following two classes. These two classes use `SQLContext` in their function signatures.
- ReadWrite.scala
- TreeModels.scala
## How was this patch tested?
Pass the existing Jenkins tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes#13352 from dongjoon-hyun/SPARK-15603.
#### What changes were proposed in this pull request?
The default value of `spark.sql.warehouse.dir` is `System.getProperty("user.dir")/spark-warehouse`. Since `System.getProperty("user.dir")` is a local dir, we should explicitly set the scheme to local filesystem.
cc yhuai
#### How was this patch tested?
Added two test cases
Author: gatorsmile <gatorsmile@gmail.com>
Closes#13348 from gatorsmile/addSchemeToDefaultWarehousePath.
#### What changes were proposed in this pull request?
This PR is to use the new entrance `Sparksession` to replace the existing `SQLContext` and `HiveContext` in SQL test suites.
No change is made in the following suites:
- `ListTablesSuite` is to test the APIs of `SQLContext`.
- `SQLContextSuite` is to test `SQLContext`
- `HiveContextCompatibilitySuite` is to test `HiveContext`
**Update**: Move tests in `ListTableSuite` to `SQLContextSuite`
#### How was this patch tested?
N/A
Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
Closes#13337 from gatorsmile/sparkSessionTest.
## What changes were proposed in this pull request?
`a` -> `an`
I use regex to generate potential error lines:
`grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml/*/*scala`
and review them line by line.
## How was this patch tested?
local build
`lint-java` checking
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#13317 from zhengruifeng/a_an.
## What changes were proposed in this pull request?
Scala doc used outdated ```+=```. Replaced with ```add```.
## How was this patch tested?
N/A
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#13346 from jkbradley/accum-doc.
## What changes were proposed in this pull request?
Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples.
Also add to the programming guide migration section.
## How was this patch tested?
SparkR tests
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#13340 from felixcheung/sqlcontextdoc.
## What changes were proposed in this pull request?
This PR corrects SparkR to use `shell()` instead of `system2()` on Windows.
Using `system2(...)` on Windows does not process windows file separator `\`. `shell(tralsate = TRUE, ...)` can treat this problem. So, this was changed to be chosen according to OS.
Existing tests were failed on Windows due to this problem. For example, those were failed.
```
8. Failure: sparkJars tag in SparkContext (test_includeJAR.R#34)
9. Failure: sparkJars tag in SparkContext (test_includeJAR.R#36)
```
The cases above were due to using of `system2`.
In addition, this PR also fixes some tests failed on Windows.
```
5. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#128)
6. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#131)
7. Failure: sparkJars sparkPackages as comma-separated strings (test_context.R#134)
```
The cases above were due to a weird behaviour of `normalizePath()`. On Linux, if the path does not exist, it just prints out the input but it prints out including the current path on Windows.
```r
# On Linus
path <- normalizePath("aa")
print(path)
[1] "aa"
# On Windows
path <- normalizePath("aa")
print(path)
[1] "C:\\Users\\aa"
```
## How was this patch tested?
Jenkins tests and manually tested in a Window machine as below:
Here is the [stdout](https://gist.github.com/HyukjinKwon/4bf35184f3a30f3bce987a58ec2bbbab) of testing.
Closes#7025
Author: hyukjinkwon <gurwls223@gmail.com>
Author: Hyukjin Kwon <gurwls223@gmail.com>
Author: Prakash PC <prakash.chinnu@gmail.com>
Closes#13165 from HyukjinKwon/pr/7025.
## What changes were proposed in this pull request?
Certain table properties (and SerDe properties) are in the protected namespace `spark.sql.sources.`, which we use internally for datasource tables. The user should not be allowed to
(1) Create a Hive table setting these properties
(2) Alter these properties in an existing table
Previously, we threw an exception if the user tried to alter the properties of an existing datasource table. However, this is overly restrictive for datasource tables and does not do anything for Hive tables.
## How was this patch tested?
DDLSuite
Author: Andrew Or <andrew@databricks.com>
Closes#13341 from andrewor14/alter-table-props.
https://issues.apache.org/jira/browse/SPARK-15542
## What changes were proposed in this pull request?
When running`./R/install-dev.sh` in **Mac OS EI Captain** environment, I got
```
mbp185-xr:spark xin$ ./R/install-dev.sh
usage: dirname path
```
This message is very confusing to me, and then I found R is not properly configured on my Mac when this script is using `$(which R)` to get R home.
I tried similar situation on CentOS with R missing, and it's giving me very clear error message while MacOS is not.
on CentOS:
```
[rootip-xxx-31-9-xx spark]# which R
/usr/bin/which: no R in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin:/root/bin)
```
but on Mac, if not found then nothing returned and this is causing the confusing message for R build failure and running R/install-dev.sh:
```
mbp185-xr:spark xin$ which R
mbp185-xr:spark xin$
```
Here I just added a clear message for this miss configuration for R when running `R/install-dev.sh`.
```
mbp185-xr:spark xin$ ./R/install-dev.sh
Cannot find R home by running 'which R', please make sure R is properly installed.
```
## How was this patch tested?
Manually tested on local machine.
Author: Xin Ren <iamshrek@126.com>
Closes#13308 from keypointt/SPARK-15542.
## What changes were proposed in this pull request?
Two more changes:
(1) Fix truncate table for data source tables (only for cases without `PARTITION`)
(2) Disallow truncating external tables or views
## How was this patch tested?
`DDLSuite`
Author: Andrew Or <andrew@databricks.com>
Closes#13315 from andrewor14/truncate-table.
## What changes were proposed in this pull request?
This PR changes SQLContext/HiveContext's public constructor to use SparkSession.build.getOrCreate and removes isRootContext from SQLContext.
## How was this patch tested?
Existing tests.
Author: Yin Huai <yhuai@databricks.com>
Closes#13310 from yhuai/SPARK-15532.
## What changes were proposed in this pull request?
This PR addresses two related issues:
1. `Dataset.showString()` should show case classes/Java beans at all levels as rows, while master code only handles top level ones.
2. `Dataset.showString()` should show full contents produced the underlying query plan
Dataset is only a view of the underlying query plan. Columns not referred by the encoder are still reachable using methods like `Dataset.col`. So it probably makes more sense to show full contents of the query plan.
## How was this patch tested?
Two new test cases are added in `DatasetSuite` to check `.showString()` output.
Author: Cheng Lian <lian@databricks.com>
Closes#13331 from liancheng/spark-15550-ds-show.
## What changes were proposed in this pull request?
This patch fixes a few integer overflows in `UnsafeSortDataFormat.copyRange()` and `ShuffleSortDataFormat copyRange()` that seems to be the most likely cause behind a number of `TimSort` contract violation errors seen in Spark 2.0 and Spark 1.6 while sorting large datasets.
## How was this patch tested?
Added a test in `ExternalSorterSuite` that instantiates a large array of the form of [150000000, 150000001, 150000002, ...., 300000000, 0, 1, 2, ..., 149999999] that triggers a `copyRange` in `TimSort.mergeLo` or `TimSort.mergeHi`. Note that the input dataset should contain at least 268.43 million rows with a certain data distribution for an overflow to occur.
Author: Sameer Agarwal <sameer@databricks.com>
Closes#13336 from sameeragarwal/timsort-bug.
## What changes were proposed in this pull request?
Add more verbose error message when order by clause is missed when using Window function.
## How was this patch tested?
Unit test.
Author: Sean Zhong <seanzhong@databricks.com>
Closes#13333 from clockfly/spark-13445.
## What changes were proposed in this pull request?
Several classes and methods have been deprecated and are creating lots of build warnings in branch-2.0. This issue is to identify and fix those items:
* WithSGD classes: Change to make class not deprecated, object deprecated, and public class constructor deprecated. Any public use will require a deprecated API. We need to keep a non-deprecated private API since we cannot eliminate certain uses: Python API, streaming algs, and examples.
* Use in PythonMLlibAPI: Change to using private constructors
* Streaming algs: No warnings after we un-deprecate the classes
* Examples: Deprecate or change ones which use deprecated APIs
* MulticlassMetrics fields (precision, etc.)
* LinearRegressionSummary.model field
## How was this patch tested?
Existing tests. Checked for warnings manually.
Author: Sean Owen <sowen@cloudera.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#13314 from jkbradley/warning-cleanups.
## What changes were proposed in this pull request?
SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that.
As part of this pull request, I also replaced a bunch of protected[sql] with private[sql], to tighten up visibility.
## How was this patch tested?
Updated test cases to reflect the changes.
Author: Reynold Xin <rxin@databricks.com>
Closes#13319 from rxin/SPARK-15552.
## What changes were proposed in this pull request?
Also sets confs in the underlying sc when using SparkSession.builder.getOrCreate(). This is a bug-fix from a post-merge comment in https://github.com/apache/spark/pull/13289
## How was this patch tested?
Python doc-tests.
Author: Eric Liang <ekl@databricks.com>
Closes#13309 from ericl/spark-15520-1.
## What changes were proposed in this pull request?
Same as #13302, but for DROP TABLE.
## How was this patch tested?
`DDLSuite`
Author: Andrew Or <andrew@databricks.com>
Closes#13307 from andrewor14/drop-table.
This patch provides detail on what to do for keytabless Oozie launches of spark apps, and adds some debug-level diagnostics of what credentials have been submitted
Author: Steve Loughran <stevel@hortonworks.com>
Author: Steve Loughran <stevel@apache.org>
Closes#11033 from steveloughran/stevel/feature/SPARK-13148-oozie.
Eliminate the need to pass sqlContext to method since it is a singleton - and we don't want to support multiple contexts in a R session.
Changes are done in a back compat way with deprecation warning added. Method signature for S3 methods are added in a concise, clean approach such that in the next release the deprecated signature can be taken out easily/cleanly (just delete a few lines per method).
Custom method dispatch is implemented to allow for multiple JVM reference types that are all 'jobj' in R and to avoid having to add 30 new exports.
Author: felixcheung <felixcheung_m@hotmail.com>
Closes#9192 from felixcheung/rsqlcontext.
## What changes were proposed in this pull request?
See https://issues.apache.org/jira/browse/SPARK-15523
This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes.
## How was this patch tested?
1. Executed `mvn clean package` in `mllib` directory
2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory.
Author: Villu Ruusmann <villu.ruusmann@gmail.com>
Closes#13297 from vruusmann/update-jpmml.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
The Binarization scala example val dataFrame : Dataframe = spark.createDataFrame(data).toDF("label", "feature"), which can't be pasted in the spark-shell as Dataframe is not imported. Compared with other examples, this explicit type is not required.
So I removed Dataframe in the code.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Manually tested
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#13266 from wangmiao1981/unit.
## What changes were proposed in this pull request?
For some of the test cases, e.g. `OrcSourceSuite`, it will create temp folders and temp files inside them. But after tests finish, the folders are not removed. This will cause lots of temp files created and space occupied, if we keep running the test cases.
The reason is dir.delete() won't work if dir is not empty. We need to recursively delete the content before deleting the folder.
## How was this patch tested?
Manually checked the temp folder to make sure the temp files were deleted.
Author: Bo Meng <mengbo@hotmail.com>
Closes#13304 from bomeng/SPARK-15537.
## What changes were proposed in this pull request?
This patch renames various DefaultSources to make their names more self-describing. The choice of "DefaultSource" was from the days when we did not have a good way to specify short names.
They are now named:
- LibSVMFileFormat
- CSVFileFormat
- JdbcRelationProvider
- JsonFileFormat
- ParquetFileFormat
- TextFileFormat
Backward compatibility is maintained through aliasing.
## How was this patch tested?
Updated relevant test cases too.
Author: Reynold Xin <rxin@databricks.com>
Closes#13311 from rxin/SPARK-15543.