spark-instrumented-optimizer/python/pyspark
Joseph K. Bradley 4a17eedb16 [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.

For SPARK-5892:
* Fix Python docs
* Various other cleanups

BTW, I accidentally merged this with master.  If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]

CC: mengxr  (ML),  davies  (Python docs)

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:

f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example.  Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
2015-02-20 02:31:32 -08:00
..
ml [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release 2015-02-20 02:31:32 -08:00
mllib [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release 2015-02-20 02:31:32 -08:00
sql [SPARK-5909][SQL] Add a clearCache command to Spark SQL's cache manager 2015-02-20 16:20:02 +08:00
streaming [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark 2015-02-17 16:54:57 -08:00
__init__.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
accumulators.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00
broadcast.py [SPARK-4548] []SPARK-4517] improve performance of python broadcast 2014-11-24 17:17:03 -08:00
cloudpickle.py [SPARK-3679] [PySpark] pickle the exact globals of functions 2014-09-24 13:00:05 -07:00
conf.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
context.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
daemon.py [SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM 2014-10-25 01:20:39 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-2313] Use socket to communicate GatewayServer port back to Python driver 2015-02-16 15:25:11 -08:00
join.py [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark 2015-02-17 16:54:57 -08:00
profiler.py [SPARK-4387][PySpark] Refactoring python profiling code to make it extensible 2015-01-28 13:48:06 -08:00
rdd.py [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark 2015-02-17 16:54:57 -08:00
rddsampler.py [SPARK-4477] [PySpark] remove numpy from RDDSampler 2014-11-20 16:40:25 -08:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-5154] [PySpark] [Streaming] Kafka streaming support in Python 2015-02-02 19:16:27 -08:00
shell.py [SPARK-5872] [SQL] create a sqlCtx in pyspark shell 2015-02-17 15:44:37 -08:00
shuffle.py [SPARK-4384] [PySpark] improve sort spilling 2014-11-19 15:45:37 -08:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-3417] Use new-style classes in PySpark 2014-09-08 15:45:36 -07:00
tests.py [SPARK-5811] Added documentation for maven coordinates and added Spark Packages support 2015-02-17 17:23:22 -08:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py Revert "[SPARK-5363] [PySpark] check ending mark in non-block way" 2015-02-17 07:49:02 -08:00