[SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct
Author: Octavian Geagla <ogeagla@gmail.com>
Closes #6501 from ogeagla/ml-guide-elemwiseprod and squashes the following commits:
4ad93d5 [Octavian Geagla] [SPARK-7576] [MLLIB] Incorporate code review feedback.
f7be7ad [Octavian Geagla] [SPARK-7576] [MLLIB] Add spark.ml user guide doc/example for ElementwiseProduct.
(cherry picked from commit da2112aef2
)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
This commit is contained in:
parent
1513cffa35
commit
11a4b30d1e
|
@ -876,5 +876,93 @@ bucketedData = bucketizer.transform(dataFrame)
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
## ElementwiseProduct
|
||||||
|
|
||||||
|
ElementwiseProduct multiplies each input vector by a provided "weight" vector, using element-wise multiplication. In other words, it scales each column of the dataset by a scalar multiplier. This represents the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_%28matrices%29) between the input vector, `v` and transforming vector, `w`, to yield a result vector.
|
||||||
|
|
||||||
|
`\[ \begin{pmatrix}
|
||||||
|
v_1 \\
|
||||||
|
\vdots \\
|
||||||
|
v_N
|
||||||
|
\end{pmatrix} \circ \begin{pmatrix}
|
||||||
|
w_1 \\
|
||||||
|
\vdots \\
|
||||||
|
w_N
|
||||||
|
\end{pmatrix}
|
||||||
|
= \begin{pmatrix}
|
||||||
|
v_1 w_1 \\
|
||||||
|
\vdots \\
|
||||||
|
v_N w_N
|
||||||
|
\end{pmatrix}
|
||||||
|
\]`
|
||||||
|
|
||||||
|
[`ElementwiseProduct`](api/scala/index.html#org.apache.spark.ml.feature.ElementwiseProduct) takes the following parameter:
|
||||||
|
|
||||||
|
* `scalingVec`: the transforming vector.
|
||||||
|
|
||||||
|
This example below demonstrates how to transform vectors using a transforming vector value.
|
||||||
|
|
||||||
|
<div class="codetabs">
|
||||||
|
<div data-lang="scala">
|
||||||
|
{% highlight scala %}
|
||||||
|
import org.apache.spark.ml.feature.ElementwiseProduct
|
||||||
|
import org.apache.spark.mllib.linalg.Vectors
|
||||||
|
|
||||||
|
// Create some vector data; also works for sparse vectors
|
||||||
|
val dataFrame = sqlContext.createDataFrame(Seq(
|
||||||
|
("a", Vectors.dense(1.0, 2.0, 3.0)),
|
||||||
|
("b", Vectors.dense(4.0, 5.0, 6.0)))).toDF("id", "vector")
|
||||||
|
|
||||||
|
val transformingVector = Vectors.dense(0.0, 1.0, 2.0)
|
||||||
|
val transformer = new ElementwiseProduct()
|
||||||
|
.setScalingVec(transformingVector)
|
||||||
|
.setInputCol("vector")
|
||||||
|
.setOutputCol("transformedVector")
|
||||||
|
|
||||||
|
// Batch transform the vectors to create new column:
|
||||||
|
val transformedData = transformer.transform(dataFrame)
|
||||||
|
|
||||||
|
{% endhighlight %}
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div data-lang="java">
|
||||||
|
{% highlight java %}
|
||||||
|
import com.google.common.collect.Lists;
|
||||||
|
|
||||||
|
import org.apache.spark.api.java.JavaRDD;
|
||||||
|
import org.apache.spark.ml.feature.ElementwiseProduct;
|
||||||
|
import org.apache.spark.mllib.linalg.Vector;
|
||||||
|
import org.apache.spark.mllib.linalg.Vectors;
|
||||||
|
import org.apache.spark.sql.DataFrame;
|
||||||
|
import org.apache.spark.sql.Row;
|
||||||
|
import org.apache.spark.sql.RowFactory;
|
||||||
|
import org.apache.spark.sql.SQLContext;
|
||||||
|
import org.apache.spark.sql.types.DataTypes;
|
||||||
|
import org.apache.spark.sql.types.Metadata;
|
||||||
|
import org.apache.spark.sql.types.StructField;
|
||||||
|
import org.apache.spark.sql.types.StructType;
|
||||||
|
|
||||||
|
// Create some vector data; also works for sparse vectors
|
||||||
|
JavaRDD<Row> jrdd = jsc.parallelize(Lists.newArrayList(
|
||||||
|
RowFactory.create("a", Vectors.dense(1.0, 2.0, 3.0)),
|
||||||
|
RowFactory.create("b", Vectors.dense(4.0, 5.0, 6.0))
|
||||||
|
));
|
||||||
|
List<StructField> fields = new ArrayList<StructField>(2);
|
||||||
|
fields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
|
||||||
|
fields.add(DataTypes.createStructField("vector", DataTypes.StringType, false));
|
||||||
|
StructType schema = DataTypes.createStructType(fields);
|
||||||
|
DataFrame dataFrame = sqlContext.createDataFrame(jrdd, schema);
|
||||||
|
Vector transformingVector = Vectors.dense(0.0, 1.0, 2.0);
|
||||||
|
ElementwiseProduct transformer = new ElementwiseProduct()
|
||||||
|
.setScalingVec(transformingVector)
|
||||||
|
.setInputCol("vector")
|
||||||
|
.setOutputCol("transformedVector");
|
||||||
|
// Batch transform the vectors to create new column:
|
||||||
|
DataFrame transformedData = transformer.transform(dataFrame);
|
||||||
|
|
||||||
|
{% endhighlight %}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
# Feature Selectors
|
# Feature Selectors
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue