Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark). It also removes some files that I forgot to delete with #10207 Author: Timothy Hunter <timhunter@databricks.com> Closes #10234 from thunterdb/12212.
2.8 KiB
layout | title | displayTitle |
---|---|---|
global | PMML model export - spark.mllib | PMML model export - spark.mllib |
- Table of contents {:toc}
spark.mllib
supported models
spark.mllib
supports model export to Predictive Model Markup Language (PMML).
The table below outlines the spark.mllib
models that can be exported to PMML and their equivalent PMML model.
`spark.mllib` model | PMML model |
---|---|
KMeansModel | ClusteringModel |
LinearRegressionModel | RegressionModel (functionName="regression") |
RidgeRegressionModel | RegressionModel (functionName="regression") |
LassoModel | RegressionModel (functionName="regression") |
SVMModel | RegressionModel (functionName="classification" normalizationMethod="none") |
Binary LogisticRegressionModel | RegressionModel (functionName="classification" normalizationMethod="logit") |
Examples
Refer to the KMeans
Scala docs and Vectors
Scala docs for details on the API.
Here a complete example of building a KMeansModel and print it out in PMML format: {% highlight scala %} import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors
// Load and parse the data val data = sc.textFile("data/mllib/kmeans_data.txt") val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
// Cluster the data into two classes using KMeans val numClusters = 2 val numIterations = 20 val clusters = KMeans.train(parsedData, numClusters, numIterations)
// Export to PMML println("PMML Model:\n" + clusters.toPMML) {% endhighlight %}
As well as exporting the PMML model to a String (model.toPMML
as in the example above), you can export the PMML model to other formats:
{% highlight scala %} // Export the model to a String in PMML format clusters.toPMML
// Export the model to a local file in PMML format clusters.toPMML("/tmp/kmeans.xml")
// Export the model to a directory on a distributed file system in PMML format clusters.toPMML(sc,"/tmp/kmeans")
// Export the model to the OutputStream in PMML format clusters.toPMML(System.out) {% endhighlight %}
For unsupported models, either you will not find a .toPMML
method or an IllegalArgumentException
will be thrown.