2015-05-18 11:46:33 -04:00
---
layout: global
2015-12-10 15:50:46 -05:00
title: PMML model export - spark.mllib
displayTitle: PMML model export - spark.mllib
2015-05-18 11:46:33 -04:00
---
* Table of contents
{:toc}
2015-12-10 15:50:46 -05:00
## `spark.mllib` supported models
2015-05-18 11:46:33 -04:00
2015-12-10 15:50:46 -05:00
`spark.mllib` supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).
2015-05-18 11:46:33 -04:00
2015-12-10 15:50:46 -05:00
The table below outlines the `spark.mllib` models that can be exported to PMML and their equivalent PMML model.
2015-05-18 11:46:33 -04:00
< table class = "table" >
< thead >
2015-12-10 15:50:46 -05:00
< tr > < th > `spark.mllib` model< / th > < th > PMML model< / th > < / tr >
2015-05-18 11:46:33 -04:00
< / thead >
< tbody >
< tr >
< td > KMeansModel< / td > < td > ClusteringModel< / td >
< / tr >
< tr >
< td > LinearRegressionModel< / td > < td > RegressionModel (functionName="regression")< / td >
< / tr >
< tr >
< td > RidgeRegressionModel< / td > < td > RegressionModel (functionName="regression")< / td >
< / tr >
< tr >
< td > LassoModel< / td > < td > RegressionModel (functionName="regression")< / td >
< / tr >
< tr >
< td > SVMModel< / td > < td > RegressionModel (functionName="classification" normalizationMethod="none")< / td >
< / tr >
< tr >
< td > Binary LogisticRegressionModel< / td > < td > RegressionModel (functionName="classification" normalizationMethod="logit")< / td >
< / tr >
< / tbody >
< / table >
## Examples
< div class = "codetabs" >
< div data-lang = "scala" markdown = "1" >
To export a supported `model` (see table above) to PMML, simply call `model.toPMML` .
2015-10-07 10:00:19 -04:00
Refer to the [`KMeans` Scala docs ](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans ) and [`Vectors` Scala docs ](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors ) for details on the API.
2015-05-18 11:46:33 -04:00
Here a complete example of building a KMeansModel and print it out in PMML format:
{% highlight scala %}
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Export to PMML
println("PMML Model:\n" + clusters.toPMML)
{% endhighlight %}
As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:
{% highlight scala %}
// Export the model to a String in PMML format
clusters.toPMML
// Export the model to a local file in PMML format
clusters.toPMML("/tmp/kmeans.xml")
// Export the model to a directory on a distributed file system in PMML format
clusters.toPMML(sc,"/tmp/kmeans")
// Export the model to the OutputStream in PMML format
clusters.toPMML(System.out)
{% endhighlight %}
For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.
< / div >
< / div >