142df4834b
## What changes were proposed in this pull request? Currently, Spark `describe` supports `StringType`. However, `describe()` returns a dataset for only all numeric columns. This PR aims to include `StringType` columns in `describe()`, `describe` without argument. **Background** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe("age", "name").show() +-------+------------------+-------+ |summary| age| name| +-------+------------------+-------+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| | min| 19| Andy| | max| 30|Michael| +-------+------------------+-------+ ``` **Before** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+ |summary| age| +-------+------------------+ | count| 2| | mean| 24.5| | stddev|7.7781745930520225| | min| 19| | max| 30| +-------+------------------+ ``` **After** ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() +-------+------------------+-------+ |summary| age| name| +-------+------------------+-------+ | count| 2| 3| | mean| 24.5| null| | stddev|7.7781745930520225| null| | min| 19| Andy| | max| 30|Michael| +-------+------------------+-------+ ``` ## How was this patch tested? Pass the Jenkins with a update testcase. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #14095 from dongjoon-hyun/SPARK-16429. |
||
---|---|---|
.. | ||
ml | ||
mllib | ||
sql | ||
streaming | ||
__init__.py | ||
accumulators.py | ||
broadcast.py | ||
cloudpickle.py | ||
conf.py | ||
context.py | ||
daemon.py | ||
files.py | ||
heapq3.py | ||
java_gateway.py | ||
join.py | ||
profiler.py | ||
rdd.py | ||
rddsampler.py | ||
resultiterable.py | ||
serializers.py | ||
shell.py | ||
shuffle.py | ||
statcounter.py | ||
status.py | ||
storagelevel.py | ||
tests.py | ||
traceback_utils.py | ||
worker.py |