[SPARK-7272] [MLLIB] User guide for PMML model export
https://issues.apache.org/jira/browse/SPARK-7272 Author: Vincenzo Selvaggio <vselvaggio@hotmail.it> Closes #6219 from selvinsource/mllib_pmml_model_export_SPARK-7272 and squashes the following commits: c866fb8 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md 1beda98 [Vincenzo Selvaggio] [SPARK-7272] Initial user guide for pmml export d670662 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md 2731375 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md 680dc33 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md 2e298b5 [Vincenzo Selvaggio] Update mllib-pmml-model-export.md a932f51 [Vincenzo Selvaggio] Create mllib-pmml-model-export.md
This commit is contained in:
parent
1ecfac6e38
commit
814b3dabdf
|
@ -39,6 +39,7 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv
|
|||
* [Optimization (developer)](mllib-optimization.html)
|
||||
* stochastic gradient descent
|
||||
* limited-memory BFGS (L-BFGS)
|
||||
* [PMML model export](mllib-pmml-model-export.html)
|
||||
|
||||
MLlib is under active development.
|
||||
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
|
||||
|
|
86
docs/mllib-pmml-model-export.md
Normal file
86
docs/mllib-pmml-model-export.md
Normal file
|
@ -0,0 +1,86 @@
|
|||
---
|
||||
layout: global
|
||||
title: PMML model export - MLlib
|
||||
displayTitle: <a href="mllib-guide.html">MLlib</a> - PMML model export
|
||||
---
|
||||
|
||||
* Table of contents
|
||||
{:toc}
|
||||
|
||||
## MLlib supported models
|
||||
|
||||
MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).
|
||||
|
||||
The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.
|
||||
|
||||
<table class="table">
|
||||
<thead>
|
||||
<tr><th>MLlib model</th><th>PMML model</th></tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>KMeansModel</td><td>ClusteringModel</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LinearRegressionModel</td><td>RegressionModel (functionName="regression")</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>RidgeRegressionModel</td><td>RegressionModel (functionName="regression")</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LassoModel</td><td>RegressionModel (functionName="regression")</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>SVMModel</td><td>RegressionModel (functionName="classification" normalizationMethod="none")</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Binary LogisticRegressionModel</td><td>RegressionModel (functionName="classification" normalizationMethod="logit")</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
## Examples
|
||||
<div class="codetabs">
|
||||
|
||||
<div data-lang="scala" markdown="1">
|
||||
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.
|
||||
|
||||
Here a complete example of building a KMeansModel and print it out in PMML format:
|
||||
{% highlight scala %}
|
||||
import org.apache.spark.mllib.clustering.KMeans
|
||||
import org.apache.spark.mllib.linalg.Vectors
|
||||
|
||||
// Load and parse the data
|
||||
val data = sc.textFile("data/mllib/kmeans_data.txt")
|
||||
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
|
||||
|
||||
// Cluster the data into two classes using KMeans
|
||||
val numClusters = 2
|
||||
val numIterations = 20
|
||||
val clusters = KMeans.train(parsedData, numClusters, numIterations)
|
||||
|
||||
// Export to PMML
|
||||
println("PMML Model:\n" + clusters.toPMML)
|
||||
{% endhighlight %}
|
||||
|
||||
As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:
|
||||
|
||||
{% highlight scala %}
|
||||
// Export the model to a String in PMML format
|
||||
clusters.toPMML
|
||||
|
||||
// Export the model to a local file in PMML format
|
||||
clusters.toPMML("/tmp/kmeans.xml")
|
||||
|
||||
// Export the model to a directory on a distributed file system in PMML format
|
||||
clusters.toPMML(sc,"/tmp/kmeans")
|
||||
|
||||
// Export the model to the OutputStream in PMML format
|
||||
clusters.toPMML(System.out)
|
||||
{% endhighlight %}
|
||||
|
||||
For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
Loading…
Reference in a new issue