spark-instrumented-optimizer/python/pyspark
Reynold Xin e98dfe627c [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
- The old implicit would convert RDDs directly to DataFrames, and that added too many methods.
- toDataFrame -> toDF
- Dsl -> functions
- implicits moved into SQLContext.implicits
- addColumn -> withColumn
- renameColumn -> withColumnRenamed

Python changes:
- toDataFrame -> toDF
- Dsl -> functions package
- addColumn -> withColumn
- renameColumn -> withColumnRenamed
- add toDF functions to RDD on SQLContext init
- add flatMap to DataFrame

Author: Reynold Xin <rxin@databricks.com>
Author: Davies Liu <davies@databricks.com>

Closes #4556 from rxin/SPARK-5752 and squashes the following commits:

5ef9910 [Reynold Xin] More fix
61d3fca [Reynold Xin] Merge branch 'df5' of github.com:davies/spark into SPARK-5752
ff5832c [Reynold Xin] Fix python
749c675 [Reynold Xin] count(*) fixes.
5806df0 [Reynold Xin] Fix build break again.
d941f3d [Reynold Xin] Fixed explode compilation break.
fe1267a [Davies Liu] flatMap
c4afb8e [Reynold Xin] style
d9de47f [Davies Liu] add comment
b783994 [Davies Liu] add comment for toDF
e2154e5 [Davies Liu] schema() -> schema
3a1004f [Davies Liu] Dsl -> functions, toDF()
fb256af [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
0dd74eb [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
97dd47c [Davies Liu] fix mistake
6168f74 [Davies Liu] fix test
1fc0199 [Davies Liu] fix test
a075cd5 [Davies Liu] clean up, toPandas
663d314 [Davies Liu] add test for agg('*')
9e214d5 [Reynold Xin] count(*) fixes.
1ed7136 [Reynold Xin] Fix build break again.
921b2e3 [Reynold Xin] Fixed explode compilation break.
14698d4 [Davies Liu] flatMap
ba3e12d [Reynold Xin] style
d08c92d [Davies Liu] add comment
5c8b524 [Davies Liu] add comment for toDF
a4e5e66 [Davies Liu] schema() -> schema
d377fc9 [Davies Liu] Dsl -> functions, toDF()
6b3086c [Reynold Xin] - toDataFrame -> toDF - Dsl -> functions - implicits moved into SQLContext.implicits - addColumn -> withColumn - renameColumn -> withColumnRenamed
807e8b1 [Reynold Xin] [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames
2015-02-13 23:03:22 -08:00
..
ml [SPARK-4586][MLLIB] Python API for ML pipeline and parameters 2015-01-28 17:14:23 -08:00
mllib [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames 2015-02-13 23:03:22 -08:00
sql [SPARK-5752][SQL] Don't implicitly convert RDDs directly to DataFrames 2015-02-13 23:03:22 -08:00
streaming [SPARK-5379][Streaming] Add awaitTerminationOrTimeout 2015-02-04 00:40:28 -08:00
__init__.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00
accumulators.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00
broadcast.py [SPARK-4548] []SPARK-4517] improve performance of python broadcast 2014-11-24 17:17:03 -08:00
cloudpickle.py [SPARK-3679] [PySpark] pickle the exact globals of functions 2014-09-24 13:00:05 -07:00
conf.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
context.py Make sure only owner can read / write to directories created for the job. 2015-02-02 14:01:32 -08:00
daemon.py [SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM 2014-10-25 01:20:39 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-5097][SQL] DataFrame 2015-01-27 16:08:24 -08:00
join.py [SPARK-546] Add full outer join to RDD and DStream. 2014-09-24 20:39:09 -07:00
profiler.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00
rdd.py SPARK-5633 pyspark saveAsTextFile support for compression codec 2015-02-06 13:55:02 -08:00
rddsampler.py [SPARK-4477] [PySpark] remove numpy from RDDSampler 2014-11-20 16:40:25 -08:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python 2015-02-02 19:16:27 -08:00
shell.py [SPARK-3273][SPARK-3301]We should read the version information from the same place 2014-09-06 15:08:43 -07:00
shuffle.py [SPARK-4384] [PySpark] improve sort spilling 2014-11-19 15:45:37 -08:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-3417] Use new-style classes in PySpark 2014-09-08 15:45:36 -07:00
tests.py [SPARK-5554] [SQL] [PySpark] add more tests for DataFrame Python API 2015-02-03 16:01:56 -08:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00