spark-instrumented-optimizer

History

sethah 46b2550bcd [SPARK-18060][ML] Avoid unnecessary computation for MLOR ## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer loop over the number of classes and an inner loop over the number of features. Inside the inner loop, we standardized the feature value (`value / featuresStd(index)`), which means we performed the computation `numFeatures * numClasses` times. We only need to perform that computation `numFeatures` times, however. If we re-order the inner and outer loop, we can avoid this, but then we lose sequential memory access. In this patch, we instead lay out the coefficients in column major order while we train, so that we can avoid the extra computation and retain sequential memory access. We convert back to row-major order when we create the model. ## How was this patch tested? This is an implementation detail only, so the original behavior should be maintained. All tests pass. I ran some performance tests to verify speedups. The results are below, and show significant speedups. ## Performance Tests Setup 3 node bare-metal cluster 120 cores total 384 gb RAM total Results NOTE: The `currentMasterTime` and `thisPatchTime` are times in seconds for a single iteration of L-BFGS or OWL-QN. \| \| numPoints \| numFeatures \| numClasses \| regParam \| elasticNetParam \| currentMasterTime (sec) \| thisPatchTime (sec) \| pctSpeedup \| \|----\|-------------\|---------------\|--------------\|------------\|-------------------\|---------------------------\|-----------------------\|--------------\| \| 0 \| 1e+07 \| 100 \| 500 \| 0.5 \| 0 \| 90 \| 18 \| 80 \| \| 1 \| 1e+08 \| 100 \| 50 \| 0.5 \| 0 \| 90 \| 19 \| 78 \| \| 2 \| 1e+08 \| 100 \| 50 \| 0.05 \| 1 \| 72 \| 19 \| 73 \| \| 3 \| 1e+06 \| 100 \| 5000 \| 0.5 \| 0 \| 93 \| 53 \| 43 \| \| 4 \| 1e+07 \| 100 \| 5000 \| 0.5 \| 0 \| 900 \| 390 \| 56 \| \| 5 \| 1e+08 \| 100 \| 500 \| 0.5 \| 0 \| 840 \| 174 \| 79 \| \| 6 \| 1e+08 \| 100 \| 200 \| 0.5 \| 0 \| 360 \| 72 \| 80 \| \| 7 \| 1e+08 \| 1000 \| 5 \| 0.5 \| 0 \| 9 \| 3 \| 66 \| Author: sethah <seth.hendrickson16@gmail.com> Closes #15593 from sethah/MLOR_PERF_COL_MAJOR_COEF.	2016-11-12 01:38:26 +00:00
..
src	[SPARK-18060][ML] Avoid unnecessary computation for MLOR	2016-11-12 01:38:26 +00:00
pom.xml	[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent	2016-07-19 11:59:46 +01:00

[SPARK-18060][ML] Avoid unnecessary computation for MLOR

## What changes were proposed in this pull request?

Before this patch, the gradient updates for multinomial logistic regression were computed by an outer loop over the number of classes and an inner loop over the number of features. Inside the inner loop, we standardized the feature value (`value / featuresStd(index)`), which means we performed the computation `numFeatures * numClasses` times. We only need to perform that computation `numFeatures` times, however. If we re-order the inner and outer loop, we can avoid this, but then we lose sequential memory access. In this patch, we instead lay out the coefficients in column major order while we train, so that we can avoid the extra computation and retain sequential memory access. We convert back to row-major order when we create the model.

## How was this patch tested?

This is an implementation detail only, so the original behavior should be maintained. All tests pass. I ran some performance tests to verify speedups. The results are below, and show significant speedups.
## Performance Tests

**Setup**

3 node bare-metal cluster
120 cores total
384 gb RAM total

**Results**

NOTE: The `currentMasterTime` and `thisPatchTime` are times in seconds for a single iteration of L-BFGS or OWL-QN.

|    |   numPoints |   numFeatures |   numClasses |   regParam |   elasticNetParam |   currentMasterTime (sec) |   thisPatchTime (sec) |   pctSpeedup |
|----|-------------|---------------|--------------|------------|-------------------|---------------------------|-----------------------|--------------|
|  0 |       1e+07 |           100 |          500 |       0.5  |                 0 |                        90 |                    18 |           80 |
|  1 |       1e+08 |           100 |           50 |       0.5  |                 0 |                        90 |                    19 |           78 |
|  2 |       1e+08 |           100 |           50 |       0.05 |                 1 |                        72 |                    19 |           73 |
|  3 |       1e+06 |           100 |         5000 |       0.5  |                 0 |                        93 |                    53 |           43 |
|  4 |       1e+07 |           100 |         5000 |       0.5  |                 0 |                       900 |                   390 |           56 |
|  5 |       1e+08 |           100 |          500 |       0.5  |                 0 |                       840 |                   174 |           79 |
|  6 |       1e+08 |           100 |          200 |       0.5  |                 0 |                       360 |                    72 |           80 |
|  7 |       1e+08 |          1000 |            5 |       0.5  |                 0 |                         9 |                     3 |           66 |

Author: sethah <seth.hendrickson16@gmail.com>

Closes #15593 from sethah/MLOR_PERF_COL_MAJOR_COEF.

2016-11-12 01:38:26 +00:00

src

[SPARK-18060][ML] Avoid unnecessary computation for MLOR

2016-11-12 01:38:26 +00:00

pom.xml

[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant definition and inherited from the parent

2016-07-19 11:59:46 +01:00