spark-instrumented-optimizer

History

Reynold Xin 784fcd5327 [SPARK-6117] [SQL] Improvements to DataFrame.describe() 1. Slightly modifications to the code to make it more readable. 2. Added Python implementation. 3. Updated the documentation to state that we don't guarantee the output schema for this function and it should only be used for exploratory data analysis. Author: Reynold Xin <rxin@databricks.com> Closes #5201 from rxin/df-describe and squashes the following commits: 25a7834 [Reynold Xin] Reset run-tests. 6abdfee [Reynold Xin] [SPARK-6117] [SQL] Improvements to DataFrame.describe()		2015-03-26 12:26:13 -07:00
..
ml	[Docs] Replace references to SchemaRDD with DataFrame	2015-03-09 13:29:19 -07:00
mllib	[SPARK-6256] [MLlib] MLlib Python API parity check for regression	2015-03-25 13:38:33 -07:00
sql	[SPARK-6117] [SQL] Improvements to DataFrame.describe()	2015-03-26 12:26:13 -07:00
streaming	[Streaming][Minor]Fix some error docs in streaming examples	2015-03-02 08:49:19 +00:00
__init__.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
accumulators.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00
broadcast.py	[SPARK-4548] []SPARK-4517] improve performance of python broadcast	2014-11-24 17:17:03 -08:00
cloudpickle.py	[SPARK-3679] [PySpark] pickle the exact globals of functions	2014-09-24 13:00:05 -07:00
conf.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
context.py	[SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()	2015-03-09 16:24:06 -07:00
daemon.py	[SPARK-6294] fix hang when call take() in JVM on PythonRDD	2015-03-12 01:34:38 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	2014-08-26 16:57:40 -07:00
java_gateway.py	[SPARK-6327] [PySpark] fix launch spark-submit from python	2015-03-16 16:26:55 -07:00
join.py	[SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark	2015-02-17 16:54:57 -08:00
profiler.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00
rdd.py	[SPARK-6370][core] Documentation: Improve all 3 docs for RDD.sample	2015-03-20 18:33:53 +00:00
rddsampler.py	[SPARK-4477] [PySpark] remove numpy from RDDSampler	2014-11-20 16:40:25 -08:00
resultiterable.py	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
serializers.py	[SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python	2015-02-02 19:16:27 -08:00
shell.py	[SPARK-5872] [SQL] create a sqlCtx in pyspark shell	2015-02-17 15:44:37 -08:00
shuffle.py	[SPARK-4384] [PySpark] improve sort spilling	2014-11-19 15:45:37 -08:00
statcounter.py	StatCounter on NumPy arrays [PYSPARK][SPARK-2012]	2014-08-01 22:33:25 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-6294] fix hang when call take() in JVM on PythonRDD	2015-03-12 01:34:38 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	Revert "[SPARK-5363] [PySpark] check ending mark in non-block way"	2015-02-17 07:49:02 -08:00