spark-instrumented-optimizer/project
VinceShieh 4a9034b173 [SPARK-17498][ML] StringIndexer enhancement for handling unseen labels
## What changes were proposed in this pull request?
This PR is an enhancement to ML StringIndexer.
Before this PR, String Indexer only supports "skip"/"error" options to deal with unseen records.
But those unseen records might still be useful and user would like to keep the unseen labels in
certain use cases, This PR enables StringIndexer to support keeping unseen labels as
indices [numLabels].

'''Before
StringIndexer().setHandleInvalid("skip")
StringIndexer().setHandleInvalid("error")
'''After
support the third option "keep"
StringIndexer().setHandleInvalid("keep")

## How was this patch tested?
Test added in StringIndexerSuite

Signed-off-by: VinceShieh <vincent.xieintel.com>
(Please fill in changes proposed in this fix)

Author: VinceShieh <vincent.xie@intel.com>

Closes #16883 from VinceShieh/spark-17498.
2017-03-07 11:24:20 -08:00
..
build.properties [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugins 2016-12-03 10:36:19 +00:00
MimaBuild.scala [SPARK-18638][BUILD] Upgrade sbt, Zinc, and Maven plugins 2016-12-03 10:36:19 +00:00
MimaExcludes.scala [SPARK-17498][ML] StringIndexer enhancement for handling unseen labels 2017-03-07 11:24:20 -08:00
plugins.sbt [SPARK-18697][BUILD] Upgrade sbt plugins 2016-12-09 14:13:01 +08:00
SparkBuild.scala [SPARK-19550][BUILD][WIP] Addendum: select Java 1.7 for scalac 2.10, still 2017-02-19 04:24:11 -08:00