spark-instrumented-optimizer/python/pyspark/ml
zhengruifeng 073ce12543 [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors
### What changes were proposed in this pull request?
1, use blocks instead of vectors
2, use Level-2 BLAS for binary, use Level-3 BLAS for multinomial

### Why are the changes needed?
1, less RAM to persist training data; (save ~40%)
2, faster than existing impl; (40% ~ 92%)

### Does this PR introduce any user-facing change?
add a new expert param `blockSize`

### How was this patch tested?
updated testsuites

Closes #27374 from zhengruifeng/blockify_lor.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-30 10:52:07 -06:00
..
linalg [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
param [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors 2020-01-30 10:52:07 -06:00
tests [MINOR][ML] Change DecisionTreeClassifier to FMClassifier in OneVsRest setWeightCol test 2020-01-17 10:04:41 +08:00
__init__.py [SPARK-24477][SPARK-24454][ML][PYTHON] Imports submodule in ml/__init__.py and add ImageSchema into __all__ 2018-06-08 09:32:11 -07:00
base.py [SPARK-29093][PYTHON][ML] Remove automatically generated param setters in _shared_params_code_gen.py 2019-10-28 11:36:10 +08:00
classification.py [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors 2020-01-30 10:52:07 -06:00
clustering.py [SPARK-30498][ML][PYSPARK] Fix some ml parity issues between python and scala 2020-01-14 17:24:17 +08:00
common.py [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch 2016-10-03 14:12:03 -07:00
evaluation.py [SPARK-29960][ML][PYSPARK] MulticlassClassificationEvaluator support hammingLoss 2019-11-21 18:32:28 +08:00
feature.py [SPARK-29565][FOLLOWUP] add setInputCol/setOutputCol in OHEModel 2020-01-16 19:23:10 +08:00
fpm.py [SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models 2019-11-15 21:44:39 -08:00
functions.py [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays 2020-01-06 16:18:51 -08:00
image.py [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 2019-07-31 14:26:18 +09:00
pipeline.py [SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages 2017-08-11 23:57:08 -07:00
recommendation.py [SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models 2019-11-15 21:44:39 -08:00
regression.py [SPARK-30543][ML][PYSPARK][R] RandomForest add Param bootstrap to control sampling method 2020-01-23 16:44:13 +08:00
stat.py [SPARK-30247][PYSPARK][FOLLOWUP] Add Python class MultivariateGaussian 2019-12-27 13:30:18 +08:00
tree.py [SPARK-30543][ML][PYSPARK][R] RandomForest add Param bootstrap to control sampling method 2020-01-23 16:44:13 +08:00
tuning.py [SPARK-30498][ML][PYSPARK] Fix some ml parity issues between python and scala 2020-01-14 17:24:17 +08:00
util.py [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON 2019-09-19 08:17:25 -05:00
wrapper.py [SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models 2019-11-15 21:44:39 -08:00