spark-instrumented-optimizer/python/pyspark/ml
Huaxin Gao 660423d717 [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation
## What changes were proposed in this pull request?

Update HashingTF to use new implementation of MurmurHash3
Make HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded

## How was this patch tested?

Change existing unit tests. Also add one unit test to make sure HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded

Closes #25303 from huaxingao/spark-23469.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-08-02 10:53:36 -05:00
..
linalg [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
param [SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams 2019-07-20 10:44:33 -05:00
tests [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation 2019-08-02 10:53:36 -05:00
__init__.py [SPARK-24477][SPARK-24454][ML][PYTHON] Imports submodule in ml/__init__.py and add ImageSchema into __all__ 2018-06-08 09:32:11 -07:00
base.py [SPARK-22922][ML][PYSPARK] Pyspark portion of the fit-multiple API 2017-12-29 16:31:25 -08:00
classification.py [SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams 2019-07-20 10:44:33 -05:00
clustering.py [SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter 2019-03-23 11:26:09 -05:00
common.py [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch 2016-10-03 14:12:03 -07:00
evaluation.py [SPARK-28045][ML][PYTHON] add missing RankingEvaluator 2019-06-25 06:44:06 -05:00
feature.py [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation 2019-08-02 10:53:36 -05:00
fpm.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
image.py [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 2019-07-31 14:26:18 +09:00
pipeline.py [SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages 2017-08-11 23:57:08 -07:00
recommendation.py [SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter 2019-03-23 11:26:09 -05:00
regression.py [SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams 2019-07-20 10:44:33 -05:00
stat.py [MINOR] Fix typos and misspellings 2018-11-05 17:34:23 -06:00
tuning.py [SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter 2019-03-23 11:26:09 -05:00
util.py [SPARK-28507][ML][PYSPARK] Remove deprecated API context(self, sqlContext) from pyspark/ml/util.py 2019-07-26 12:12:11 -05:00
wrapper.py [SPARK-22798][PYTHON][ML] Add multiple column support to PySpark StringIndexer 2019-02-20 08:52:46 -06:00