spark-instrumented-optimizer/python/pyspark/ml
zhengruifeng d0c3e9f1f7 [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors
### What changes were proposed in this pull request?
1, use blocks instead of vectors for performance improvement
2, use Level-2 BLAS
3, move standardization of input vectors outside of gradient computation

### Why are the changes needed?
1, less RAM to persist training data; (save ~40%)
2, faster than existing impl; (30% ~ 102%)

### Does this PR introduce any user-facing change?
add a new expert param `blockSize`

### How was this patch tested?
updated testsuites

Closes #27396 from zhengruifeng/blockify_lireg.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2020-01-31 21:04:26 -06:00
..
linalg [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
param [SPARK-30659][ML][PYSPARK] LogisticRegression blockify input vectors 2020-01-30 10:52:07 -06:00
tests [MINOR][ML] Change DecisionTreeClassifier to FMClassifier in OneVsRest setWeightCol test 2020-01-17 10:04:41 +08:00
__init__.py [SPARK-24477][SPARK-24454][ML][PYTHON] Imports submodule in ml/__init__.py and add ImageSchema into __all__ 2018-06-08 09:32:11 -07:00
base.py [SPARK-29093][PYTHON][ML] Remove automatically generated param setters in _shared_params_code_gen.py 2019-10-28 11:36:10 +08:00
classification.py [SPARK-30662][ML][PYSPARK] ALS/MLP extend HasBlockSize 2020-01-30 13:13:10 -06:00
clustering.py [SPARK-30498][ML][PYSPARK] Fix some ml parity issues between python and scala 2020-01-14 17:24:17 +08:00
common.py [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch 2016-10-03 14:12:03 -07:00
evaluation.py [SPARK-29960][ML][PYSPARK] MulticlassClassificationEvaluator support hammingLoss 2019-11-21 18:32:28 +08:00
feature.py [SPARK-29093][ML][PYSPARK][FOLLOW-UP] Remove duplicate setter 2020-01-30 23:36:39 -08:00
fpm.py [SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models 2019-11-15 21:44:39 -08:00
functions.py [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays 2020-01-06 16:18:51 -08:00
image.py [SPARK-25382][SQL][PYSPARK] Remove ImageSchema.readImages in 3.0 2019-07-31 14:26:18 +09:00
pipeline.py [SPARK-17025][ML][PYTHON] Persistence for Pipelines with Python-only Stages 2017-08-11 23:57:08 -07:00
recommendation.py [SPARK-30662][ML][PYSPARK] ALS/MLP extend HasBlockSize 2020-01-30 13:13:10 -06:00
regression.py [SPARK-30660][ML][PYSPARK] LinearRegression blockify input vectors 2020-01-31 21:04:26 -06:00
stat.py [SPARK-30247][PYSPARK][FOLLOWUP] Add Python class MultivariateGaussian 2019-12-27 13:30:18 +08:00
tree.py [SPARK-30543][ML][PYSPARK][R] RandomForest add Param bootstrap to control sampling method 2020-01-23 16:44:13 +08:00
tuning.py [SPARK-30498][ML][PYSPARK] Fix some ml parity issues between python and scala 2020-01-14 17:24:17 +08:00
util.py [SPARK-28985][PYTHON][ML] Add common classes (JavaPredictor/JavaClassificationModel/JavaProbabilisticClassifier) in PYTHON 2019-09-19 08:17:25 -05:00
wrapper.py [SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models 2019-11-15 21:44:39 -08:00