spark-instrumented-optimizer/python/pyspark/mllib
Yong Tang bc748b7b8f [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib
## What changes were proposed in this pull request?

This fix tries to add binary toggle Param to PySpark HashingTF in ML & MLlib. If this toggle is set, then all non-zero counts will be set to 1.

Note: This fix (SPARK-14238) is extended from SPARK-13963 where Scala implementation was done.

## How was this patch tested?

This fix adds two tests to cover the code changes. One for HashingTF in PySpark's ML and one for HashingTF in PySpark's MLLib.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #12079 from yongtang/SPARK-14238.
2016-04-14 21:53:32 +02:00
..
linalg [SPARK-13594][SQL] remove typed operations(e.g. map, flatMap) from python DataFrame 2016-03-02 15:26:34 -08:00
stat [SPARK-8996] [MLLIB] [PYSPARK] Python API for Kolmogorov-Smirnov Test 2015-07-20 09:00:01 -07:00
__init__.py [SPARK-8032] [PYSPARK] Make version checking for NumPy in MLlib more robust 2015-06-02 23:24:47 -07:00
classification.py [SPARK-12633][PYSPARK] [DOC] PySpark regression parameter desc to consistent format 2016-02-29 15:52:41 +02:00
clustering.py [SPARK-13672][ML] Add python examples of BisectingKMeans in ML and MLLIB 2016-03-11 09:21:12 +02:00
common.py [SPARK-13244][SQL] Migrates DataFrame to Dataset 2016-03-10 17:00:17 -08:00
evaluation.py [SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib 2015-12-16 15:48:11 -08:00
feature.py [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib 2016-04-14 21:53:32 +02:00
fpm.py [MINOR] Fix typos in comments and testcase name of code 2016-03-03 22:42:12 +00:00
random.py [PYSPARK] [MLLIB] [DOCS] Replaced addversion with versionadded in mllib.random 2015-09-15 12:23:20 -07:00
recommendation.py [SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format 2016-02-22 12:48:37 +02:00
regression.py [SPARK-12633][PYSPARK] [DOC] PySpark regression parameter desc to consistent format 2016-02-29 15:52:41 +02:00
tests.py [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib 2016-04-14 21:53:32 +02:00
tree.py [SPARK-12634][PYSPARK][DOC] PySpark tree parameter desc to consistent format 2016-02-26 08:30:32 -08:00
util.py [SPARK-10279] [MLLIB] [PYSPARK] [DOCS] Add @since annotation to pyspark.mllib.util 2015-09-17 08:50:00 -07:00