spark-instrumented-optimizer

History

Nathan Howell 1bc435ae3a [SPARK-10064] [ML] Parallelize decision tree bin split calculations Reimplement `DecisionTree.findSplitsBins` via `RDD` to parallelize bin calculation. With large feature spaces the current implementation is very slow. This change limits the features that are distributed (or collected) to just the continuous features, and performs the split calculations in parallel. It completes on a real multi terabyte dataset in less than a minute instead of multiple hours. Author: Nathan Howell <nhowell@godaddy.com> Closes #8246 from NathanHowell/SPARK-10064.	2015-10-07 17:46:16 -07:00
..
src	[SPARK-10064] [ML] Parallelize decision tree bin split calculations	2015-10-07 17:46:16 -07:00
pom.xml	[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.	2015-10-07 14:11:21 -07:00

Nathan Howell 1bc435ae3a [SPARK-10064] [ML] Parallelize decision tree bin split calculations

Reimplement `DecisionTree.findSplitsBins` via `RDD` to parallelize bin calculation.

With large feature spaces the current implementation is very slow. This change limits the features that are distributed (or collected) to just the continuous features, and performs the split calculations in parallel. It completes on a real multi terabyte dataset in less than a minute instead of multiple hours.

Author: Nathan Howell <nhowell@godaddy.com>

Closes #8246 from NathanHowell/SPARK-10064.

2015-10-07 17:46:16 -07:00

src

[SPARK-10064] [ML] Parallelize decision tree bin split calculations

2015-10-07 17:46:16 -07:00

pom.xml

[SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py.

2015-10-07 14:11:21 -07:00