spark-instrumented-optimizer

History

Davies Liu 08488c175f [SPARK-5469] restructure pyspark.sql into multiple files All the DataTypes moved into pyspark.sql.types The changes can be tracked by `--find-copies-harder -M25` ``` davieslocalhost:~/work/spark/python$ git diff --find-copies-harder -M25 --numstat master.. 2 5 python/docs/pyspark.ml.rst 0 3 python/docs/pyspark.mllib.rst 10 2 python/docs/pyspark.sql.rst 1 1 python/pyspark/mllib/linalg.py 21 14 python/pyspark/{mllib => sql}/__init__.py 14 2108 python/pyspark/{sql.py => sql/context.py} 10 1772 python/pyspark/{sql.py => sql/dataframe.py} 7 6 python/pyspark/{sql_tests.py => sql/tests.py} 8 1465 python/pyspark/{sql.py => sql/types.py} 4 2 python/run-tests 1 1 sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala ``` Also `git blame -C -C python/pyspark/sql/context.py` to track the history. Author: Davies Liu <davies@databricks.com> Closes #4479 from davies/sql and squashes the following commits: 1b5f0a5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into sql 2b2b983 [Davies Liu] restructure pyspark.sql		2015-02-09 20:49:22 -08:00
..
ml	[SPARK-4586][MLLIB] Python API for ML pipeline and parameters	2015-01-28 17:14:23 -08:00
mllib	[SPARK-5469] restructure pyspark.sql into multiple files	2015-02-09 20:49:22 -08:00
sql	[SPARK-5469] restructure pyspark.sql into multiple files	2015-02-09 20:49:22 -08:00
streaming	[SPARK-5379][Streaming] Add awaitTerminationOrTimeout	2015-02-04 00:40:28 -08:00
__init__.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00
accumulators.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00
broadcast.py	[SPARK-4548] []SPARK-4517] improve performance of python broadcast	2014-11-24 17:17:03 -08:00
cloudpickle.py	[SPARK-3679] [PySpark] pickle the exact globals of functions	2014-09-24 13:00:05 -07:00
conf.py	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
context.py	Make sure only owner can read / write to directories created for the job.	2015-02-02 14:01:32 -08:00
daemon.py	[SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM	2014-10-25 01:20:39 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	2014-08-26 16:57:40 -07:00
java_gateway.py	[SPARK-5097][SQL] DataFrame	2015-01-27 16:08:24 -08:00
join.py	[SPARK-546] Add full outer join to RDD and DStream.	2014-09-24 20:39:09 -07:00
profiler.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00
rdd.py	SPARK-5633 pyspark saveAsTextFile support for compression codec	2015-02-06 13:55:02 -08:00
rddsampler.py	[SPARK-4477] [PySpark] remove numpy from RDDSampler	2014-11-20 16:40:25 -08:00
resultiterable.py	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
serializers.py	[SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python	2015-02-02 19:16:27 -08:00
shell.py	[SPARK-3273][SPARK-3301]We should read the version information from the same place	2014-09-06 15:08:43 -07:00
shuffle.py	[SPARK-4384] [PySpark] improve sort spilling	2014-11-19 15:45:37 -08:00
statcounter.py	StatCounter on NumPy arrays [PYSPARK][SPARK-2012]	2014-08-01 22:33:25 -07:00
storagelevel.py	[SPARK-3417] Use new-style classes in PySpark	2014-09-08 15:45:36 -07:00
tests.py	[SPARK-5554] [SQL] [PySpark] add more tests for DataFrame Python API	2015-02-03 16:01:56 -08:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-4387][PySpark] Refactoring python profiling code to make it extensible	2015-01-28 13:48:06 -08:00