Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/mllib-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv
* [Optimization (developer)](mllib-optimization.html)
* stochastic gradient descent
* limited-memory BFGS (L-BFGS)
* [PMML model export](mllib-pmml-model-export.html)

MLlib is under active development.
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,
Expand Down
86 changes: 86 additions & 0 deletions docs/mllib-pmml-model-export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: global
title: PMML model export - MLlib
displayTitle: <a href="mllib-guide.html">MLlib</a> - PMML model export
---

* Table of contents
{:toc}

## MLlib supported models

MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).

The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.

<table class="table">
<thead>
<tr><th>MLlib model</th><th>PMML model</th></tr>
</thead>
<tbody>
<tr>
<td>KMeansModel</td><td>ClusteringModel</td>
</tr>
<tr>
<td>LinearRegressionModel</td><td>RegressionModel (functionName="regression")</td>
</tr>
<tr>
<td>RidgeRegressionModel</td><td>RegressionModel (functionName="regression")</td>
</tr>
<tr>
<td>LassoModel</td><td>RegressionModel (functionName="regression")</td>
</tr>
<tr>
<td>SVMModel</td><td>RegressionModel (functionName="classification" normalizationMethod="none")</td>
</tr>
<tr>
<td>Binary LogisticRegressionModel</td><td>RegressionModel (functionName="classification" normalizationMethod="logit")</td>
</tr>
</tbody>
</table>

## Examples
<div class="codetabs">

<div data-lang="scala" markdown="1">
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.

Here a complete example of building a KMeansModel and print it out in PMML format:
{% highlight scala %}
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

// Load and parse the data
val data = sc.textFile("data/mllib/kmeans_data.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()

// Cluster the data into two classes using KMeans
val numClusters = 2
val numIterations = 20
val clusters = KMeans.train(parsedData, numClusters, numIterations)

// Export to PMML
println("PMML Model:\n" + clusters.toPMML)
{% endhighlight %}

As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:

{% highlight scala %}
// Export the model to a String in PMML format
clusters.toPMML

// Export the model to a local file in PMML format
clusters.toPMML("/tmp/kmeans.xml")

// Export the model to a directory on a distributed file system in PMML format
clusters.toPMML(sc,"/tmp/kmeans")

// Export the model to the OutputStream in PMML format
clusters.toPMML(System.out)
{% endhighlight %}

For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.

</div>

</div>