spark-instrumented-optimizer

History

Bryan Cutler 16c4c03c71 [SPARK-19357][ML] Adding parallel model evaluation in ML tuning ## What changes were proposed in this pull request? Modified `CrossValidator` and `TrainValidationSplit` to be able to evaluate models in parallel for a given parameter grid. The level of parallelism is controlled by a parameter `numParallelEval` used to schedule a number of models to be trained/evaluated so that the jobs can be run concurrently. This is a naive approach that does not check the cluster for needed resources, so care must be taken by the user to tune the parameter appropriately. The default value is `1` which will train/evaluate in serial. ## How was this patch tested? Added unit tests for CrossValidator and TrainValidationSplit to verify that model selection is the same when run in serial vs parallel. Manual testing to verify tasks run in parallel when param is > 1. Added parameter usage to relevant examples. Author: Bryan Cutler <cutlerb@gmail.com> Closes #16774 from BryanCutler/parallel-model-eval-SPARK-19357.	2017-09-06 14:12:27 +02:00
..
src/main	[SPARK-19357][ML] Adding parallel model evaluation in ML tuning	2017-09-06 14:12:27 +02:00
pom.xml	[SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project	2017-05-25 10:49:14 -07:00

Bryan Cutler 16c4c03c71 [SPARK-19357][ML] Adding parallel model evaluation in ML tuning

## What changes were proposed in this pull request?
Modified `CrossValidator` and `TrainValidationSplit` to be able to evaluate models in parallel for a given parameter grid.  The level of parallelism is controlled by a parameter `numParallelEval` used to schedule a number of models to be trained/evaluated so that the jobs can be run concurrently.  This is a naive approach that does not check the cluster for needed resources, so care must be taken by the user to tune the parameter appropriately.  The default value is `1` which will train/evaluate in serial.

## How was this patch tested?
Added unit tests for CrossValidator and TrainValidationSplit to verify that model selection is the same when run in serial vs parallel.  Manual testing to verify tasks run in parallel when param is > 1. Added parameter usage to relevant examples.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #16774 from BryanCutler/parallel-model-eval-SPARK-19357.

2017-09-06 14:12:27 +02:00

src/main

[SPARK-19357][ML] Adding parallel model evaluation in ML tuning

2017-09-06 14:12:27 +02:00

pom.xml

[SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 10:49:14 -07:00