[SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols
### What changes were proposed in this pull request? 1, update the comment: `Note, the relevant columns must also be set in inputCols` -> `Note, the relevant columns should also be set in inputCols`; 2, add a check, and if there are `categoricalCols` not set in `inputCols`, log.warn it; ### Why are the changes needed? 1, there is no check to make sure `categoricalCols` are all set in `inputCols`, to keep existing behavior, update this comments; ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? repl Closes #29868 from zhengruifeng/feature_hash_cat_doc. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
This commit is contained in:
parent
c65b64552f
commit
bc77e5b840
|
@ -91,8 +91,8 @@ class FeatureHasher(@Since("2.3.0") override val uid: String) extends Transforme
|
|||
/**
|
||||
* Numeric columns to treat as categorical features. By default only string and boolean
|
||||
* columns are treated as categorical, so this param can be used to explicitly specify the
|
||||
* numerical columns to treat as categorical. Note, the relevant columns must also be set in
|
||||
* `inputCols`.
|
||||
* numerical columns to treat as categorical. Note, the relevant columns should also be set in
|
||||
* `inputCols`, categorical columns not set in `inputCols` will be listed in a warning.
|
||||
* @group param
|
||||
*/
|
||||
@Since("2.3.0")
|
||||
|
@ -195,7 +195,14 @@ class FeatureHasher(@Since("2.3.0") override val uid: String) extends Transforme
|
|||
|
||||
@Since("2.3.0")
|
||||
override def transformSchema(schema: StructType): StructType = {
|
||||
val fields = schema($(inputCols).toSet)
|
||||
val localInputCols = $(inputCols).toSet
|
||||
if (isSet(categoricalCols)) {
|
||||
val set = $(categoricalCols).filterNot(c => localInputCols.contains(c))
|
||||
if (set.nonEmpty) {
|
||||
log.warn(s"categoricalCols ${set.mkString("[", ",", "]")} do not exist in inputCols")
|
||||
}
|
||||
}
|
||||
val fields = schema(localInputCols)
|
||||
fields.foreach { fieldSchema =>
|
||||
val dataType = fieldSchema.dataType
|
||||
val fieldName = fieldSchema.name
|
||||
|
|
Loading…
Reference in a new issue