Skip to content

Commit 1beda98

Browse files
committed
[SPARK-7272] Initial user guide for pmml export
1 parent d670662 commit 1beda98

File tree

2 files changed

+46
-2
lines changed

2 files changed

+46
-2
lines changed

docs/mllib-guide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ filtering, dimensionality reduction, as well as underlying optimization primitiv
3939
* [Optimization (developer)](mllib-optimization.html)
4040
* stochastic gradient descent
4141
* limited-memory BFGS (L-BFGS)
42+
* [PMML model export](mllib-pmml-model-export.html)
4243

4344
MLlib is under active development.
4445
The APIs marked `Experimental`/`DeveloperApi` may change in future releases,

docs/mllib-pmml-model-export.md

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - PMML model export
99

1010
## MLlib supported models
1111

12-
MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)) format.
12+
MLlib supports model export to Predictive Model Markup Language ([PMML](http://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)).
1313

14-
The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML format.
14+
The table below outlines the MLlib models that can be exported to PMML and their equivalent PMML model.
1515

1616
<table class="table">
1717
<thead>
@@ -40,4 +40,47 @@ The table below outlines the MLlib models that can be exported to PMML and their
4040
</table>
4141

4242
## Examples
43+
<div class="codetabs">
4344

45+
<div data-lang="scala" markdown="1">
46+
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.
47+
48+
Here a complete example of building a KMeansModel and print it out in PMML format:
49+
{% highlight scala %}
50+
import org.apache.spark.mllib.clustering.KMeans
51+
import org.apache.spark.mllib.linalg.Vectors
52+
53+
// Load and parse the data
54+
val data = sc.textFile("data/mllib/kmeans_data.txt")
55+
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
56+
57+
// Cluster the data into two classes using KMeans
58+
val numClusters = 2
59+
val numIterations = 20
60+
val clusters = KMeans.train(parsedData, numClusters, numIterations)
61+
62+
// Export to PMML
63+
println("PMML export = " + clusters.toPMML)
64+
{% endhighlight %}
65+
66+
As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:
67+
68+
{% highlight scala %}
69+
// Export the model to a String in PMML format
70+
clusters.toPMML
71+
72+
// Export the model to a local file in PMML format
73+
clusters.toPMML("/tmp/kmeans.xml")
74+
75+
// Export the model to a directory on a distributed file system in PMML format
76+
clusters.toPMML(sc,"/tmp/kmeans")
77+
78+
// Export the model to the OutputStream in PMML format
79+
clusters.toPMML(System.out)
80+
{% endhighlight %}
81+
82+
For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.
83+
84+
</div>
85+
86+
</div>

0 commit comments

Comments
 (0)