397b843890
### What changes were proposed in this pull request? Code in the PR generates random parameters for hyperparameter tuning. A discussion with Sean Owen can be found on the dev mailing list here: http://apache-spark-developers-list.1001551.n3.nabble.com/Hyperparameter-Optimization-via-Randomization-td30629.html All code is entirely my own work and I license the work to the project under the project’s open source license. ### Why are the changes needed? Randomization can be a more effective techinique than a grid search since min/max points can fall between the grid and never be found. Randomisation is not so restricted although the probability of finding minima/maxima is dependent on the number of attempts. Alice Zheng has an accessible description on how this technique works at https://www.oreilly.com/library/view/evaluating-machine-learning/9781492048756/ch04.html Although there are Python libraries with more sophisticated techniques, not every Spark developer is using Python. ### Does this PR introduce _any_ user-facing change? A new class (`ParamRandomBuilder.scala`) and its tests have been created but there is no change to existing code. This class offers an alternative to `ParamGridBuilder` and can be dropped into the code wherever `ParamGridBuilder` appears. Indeed, it extends `ParamGridBuilder` and is completely compatible with its interface. It merely adds one method that provides a range over which a hyperparameter will be randomly defined. ### How was this patch tested? Tests `ParamRandomBuilderSuite.scala` and `RandomRangesSuite.scala` were added. `ParamRandomBuilderSuite` is the analogue of the already existing `ParamGridBuilderSuite` which tests the user-facing interface. `RandomRangesSuite` uses ScalaCheck to test the random ranges over which hyperparameters are distributed. Closes #31535 from PhillHenry/ParamRandomBuilder. Authored-by: Phillip Henry <PhillHenry@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> |
||
---|---|---|
.. | ||
linalg | ||
param | ||
tests | ||
__init__.py | ||
_typing.pyi | ||
base.py | ||
base.pyi | ||
classification.py | ||
classification.pyi | ||
clustering.py | ||
clustering.pyi | ||
common.py | ||
common.pyi | ||
evaluation.py | ||
evaluation.pyi | ||
feature.py | ||
feature.pyi | ||
fpm.py | ||
fpm.pyi | ||
functions.py | ||
functions.pyi | ||
image.py | ||
image.pyi | ||
pipeline.py | ||
pipeline.pyi | ||
recommendation.py | ||
recommendation.pyi | ||
regression.py | ||
regression.pyi | ||
stat.py | ||
stat.pyi | ||
tree.py | ||
tree.pyi | ||
tuning.py | ||
tuning.pyi | ||
util.py | ||
util.pyi | ||
wrapper.py | ||
wrapper.pyi |