spark-instrumented-optimizer

History

Dongjoon Hyun 142df4834b [SPARK-16429][SQL] Include `StringType` columns in `describe()` ## What changes were proposed in this pull request? Currently, Spark `describe` supports `StringType`. However, `describe()` returns a dataset for only all numeric columns. This PR aims to include `StringType` columns in `describe()`, `describe` without argument. Background ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show() +-------+------------------+-------+ \|summary\| age\| name\| +-------+------------------+-------+ \| count\| 2\| 3\| \| mean\| 24.5\| null\| \| stddev\|7.7781745930520225\| null\| \| min\| 19\| Andy\| \| max\| 30\|Michael\| +-------+------------------+-------+ ``` Before ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+ \|summary\| age\| +-------+------------------+ \| count\| 2\| \| mean\| 24.5\| \| stddev\|7.7781745930520225\| \| min\| 19\| \| max\| 30\| +-------+------------------+ ``` After ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+-------+ \|summary\| age\| name\| +-------+------------------+-------+ \| count\| 2\| 3\| \| mean\| 24.5\| null\| \| stddev\|7.7781745930520225\| null\| \| min\| 19\| Andy\| \| max\| 30\|Michael\| +-------+------------------+-------+ ``` ## How was this patch tested? Pass the Jenkins with a update testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14095 from dongjoon-hyun/SPARK-16429.		2016-07-08 14:36:50 -07:00
..
jarTest.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
packageInAJarTest.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_binary_function.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_binaryFile.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_broadcast.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_client.R	[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader	2016-05-09 09:58:36 -07:00
test_context.R	[SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup	2016-06-23 09:45:01 -07:00
test_includeJAR.R	[SPARK-8603][SPARKR] Use shell() instead of system2() for SparkR on Windows	2016-05-26 20:55:06 -07:00
test_includePackage.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_mllib.R	[SPARK-15177][.1][R] make SparkR model params and default values consistent with MLlib	2016-06-21 08:31:15 -07:00
test_parallelize_collect.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_rdd.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_Serde.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_shuffle.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_sparkSQL.R	[SPARK-16429][SQL] Include `StringType` columns in `describe()`	2016-07-08 14:36:50 -07:00
test_take.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_textFile.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_utils.R	[SPARK-15159][SPARKR] SparkR SparkSession API	2016-06-17 21:36:01 -07:00
test_Windows.R	[SPARK-8603][SPARKR] Use shell() instead of system2() for SparkR on Windows	2016-05-26 20:55:06 -07:00