spark-instrumented-optimizer

History

Huaxin Gao 660423d717 [SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation ## What changes were proposed in this pull request? Update HashingTF to use new implementation of MurmurHash3 Make HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded ## How was this patch tested? Change existing unit tests. Also add one unit test to make sure HashingTF use the old MurmurHash3 when a model from pre 3.0 is loaded Closes #25303 from huaxingao/spark-23469. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>		2019-08-02 10:53:36 -05:00
..
linalg	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
param	[SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams	2019-07-20 10:44:33 -05:00
tests	[SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation	2019-08-02 10:53:36 -05:00
__init__.py	[SPARK-24477][SPARK-24454][ML][PYTHON] Imports submodule in ml/__init__.py and add ImageSchema into __all__	2018-06-08 09:32:11 -07:00
base.py	[SPARK-22922][ML][PYSPARK] Pyspark portion of the fit-multiple API	2017-12-29 16:31:25 -08:00
classification.py	[SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams	2019-07-20 10:44:33 -05:00
clustering.py	[SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter	2019-03-23 11:26:09 -05:00
common.py	[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch	2016-10-03 14:12:03 -07:00
evaluation.py	[SPARK-28045][ML][PYTHON] add missing RankingEvaluator	2019-06-25 06:44:06 -05:00
feature.py	[SPARK-23469][ML] HashingTF should use corrected MurmurHash3 implementation	2019-08-02 10:53:36 -05:00
fpm.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
image.py	[SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0	2019-07-31 14:26:18 +09:00
pipeline.py	[SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages	2017-08-11 23:57:08 -07:00
recommendation.py	[SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter	2019-03-23 11:26:09 -05:00
regression.py	[SPARK-28243][PYSPARK][ML] Remove setFeatureSubsetStrategy and setSubsamplingRate from Python TreeEnsembleParams	2019-07-20 10:44:33 -05:00
stat.py	[MINOR] Fix typos and misspellings	2018-11-05 17:34:23 -06:00
tuning.py	[SPARK-23643][CORE][SQL][ML] Shrinking the buffer in hashSeed up to size of the seed parameter	2019-03-23 11:26:09 -05:00
util.py	[SPARK-28507][ML][PYSPARK] Remove deprecated API context(self, sqlContext) from pyspark/ml/util.py	2019-07-26 12:12:11 -05:00
wrapper.py	[SPARK-22798][PYTHON][ML] Add multiple column support to PySpark StringIndexer	2019-02-20 08:52:46 -06:00