spark-instrumented-optimizer/docs/mllib-pmml-model-export.md
Vincenzo Selvaggio 814b3dabdf [SPARK-7272] [MLLIB] User guide for PMML model export
https://issues.apache.org/jira/browse/SPARK-7272

Author: Vincenzo Selvaggio <vselvaggio@hotmail.it>

Closes #6219 from selvinsource/mllib_pmml_model_export_SPARK-7272 and squashes the following commits:

c866fb8 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
1beda98 [Vincenzo Selvaggio] [SPARK-7272] Initial user guide for pmml export
d670662 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2731375 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
680dc33 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
2e298b5 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md
a932f51 [Vincenzo Selvaggio] Create mllib-pmml-model-export.md
2015-05-18 08:46:33 -07:00

2.6 KiB

layout title displayTitle
global PMML model export - MLlib <a href="mllib-guide.html">MLlib</a> - PMML model export
  • Table of contents {:toc}

MLlib supported models

MLlib supports model export to Predictive Model Markup Language (PMML).

The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.

MLlib modelPMML model
KMeansModelClusteringModel
LinearRegressionModelRegressionModel (functionName="regression")
RidgeRegressionModelRegressionModel (functionName="regression")
LassoModelRegressionModel (functionName="regression")
SVMModelRegressionModel (functionName="classification" normalizationMethod="none")
Binary LogisticRegressionModelRegressionModel (functionName="classification" normalizationMethod="logit")

Examples

To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.

Here a complete example of building a KMeansModel and print it out in PMML format: {% highlight scala %} import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data val data = sc.textFile("data/mllib/kmeans_data.txt") val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()

// Cluster the data into two classes using KMeans val numClusters = 2 val numIterations = 20 val clusters = KMeans.train(parsedData, numClusters, numIterations)

// Export to PMML println("PMML Model:\n" + clusters.toPMML) {% endhighlight %}

As well as exporting the PMML model to a String (model.toPMML as in the example above), you can export the PMML model to other formats:

{% highlight scala %} // Export the model to a String in PMML format clusters.toPMML

// Export the model to a local file in PMML format clusters.toPMML("/tmp/kmeans.xml")

// Export the model to a directory on a distributed file system in PMML format clusters.toPMML(sc,"/tmp/kmeans")

// Export the model to the OutputStream in PMML format clusters.toPMML(System.out) {% endhighlight %}

For unsupported models, either you will not find a .toPMML method or an IllegalArgumentException will be thrown.