-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PMMLExportable to ML with KMeans #9207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
holdenk
wants to merge
40
commits into
apache:master
from
holdenk:SPARK-11171-SPARK-11237-Add-PMML-export-for-ML-KMeans
Closed
Changes from 4 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
1749aec
Try adding PMMLExportable to ML with KMeans
holdenk bc1b508
Everything is better with tests
holdenk adf0b36
Move the PMML loading utils into test and make it clear they are test…
holdenk 494ecbf
Merge in master (now both pmml and native export)
holdenk 41611b8
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 9525283
Merge in master
holdenk 461c1ce
Fix import ordering from automerge
holdenk 9aa5265
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk b514421
Import order fix
holdenk 57c303d
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 6e2efc2
Merge branch 'SPARK-11171-SPARK-11237-Add-PMML-export-for-ML-KMeans' …
holdenk 90b0e22
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 2adc069
Merge in master
holdenk 4f693d1
Just make the old ML one inheret from the new one and override the on…
holdenk 4f3ac08
Fix ambigious reference
holdenk 46879d3
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk a99880d
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 2fb0857
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 8042e4f
Exclude the methods we've moved around
holdenk bebd0e7
Merge in master
holdenk 344d5a0
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk a30b6c5
Merge in master
holdenk cf335bf
Start refactoring to the more generic writer
holdenk 59dd4c6
More progress towards the new API
holdenk b5a57ea
Revert some uneeded changes with the new approach
holdenk b7edccf
Basic test pass
holdenk a41b474
Refactor tests a bit
holdenk 1146e45
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 0dd6c94
Make the writer have a convience function to set the type to pmml (an…
holdenk 49f8a8d
Add pmml() to the writer
holdenk e6845f1
pmmml -> pmml
holdenk 0b042e8
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 9170b3f
Add a MIMA exclusion for the saveImpl change and fix inadvertant spac…
holdenk 8579c1b
Merge in master
holdenk c8573f0
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 8103b76
Remove TODO since we have moved it to Spark 2.1 now that master is on…
holdenk bdcfbd1
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 00173aa
Update master
holdenk 0e8c523
Merge branch 'master' into SPARK-11171-SPARK-11237-Add-PMML-export-fo…
holdenk 9cb8994
Merge in master
holdenk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
90 changes: 90 additions & 0 deletions
90
mllib/src/main/scala/org/apache/spark/ml/pmml/PMMLExportable.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.ml.pmml | ||
|
|
||
| import java.io.{File, OutputStream, StringWriter} | ||
| import javax.xml.transform.stream.StreamResult | ||
|
|
||
| import org.jpmml.model.JAXBUtil | ||
|
|
||
| import org.apache.spark.SparkContext | ||
| import org.apache.spark.annotation.{DeveloperApi, Experimental, Since} | ||
| import org.apache.spark.mllib.pmml.export.PMMLModelExportFactory | ||
|
|
||
| /** | ||
| * :: DeveloperApi :: | ||
| * Export model to the PMML format | ||
| * Predictive Model Markup Language (PMML) is an XML-based file format | ||
| * developed by the Data Mining Group (www.dmg.org). | ||
| * Based on [[org.apache.spark.mllib.pmml.Exportable]] | ||
| */ | ||
| @DeveloperApi | ||
| @Since("1.6.0") | ||
| trait PMMLExportable { | ||
|
|
||
| /** | ||
| * Export the model to the stream result in PMML format. | ||
| */ | ||
| private[spark] def toPMML(streamResult: StreamResult): Unit | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * Export the model to a local file in PMML format | ||
| */ | ||
| @Experimental | ||
| @Since("1.6.0") | ||
| def toPMML(localPath: String): Unit = { | ||
| toPMML(new StreamResult(new File(localPath))) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * Export the model to a directory on a distributed file system in PMML format. | ||
| * Models should override if they may contain more data than | ||
| * is reasonable to store locally. | ||
| */ | ||
| @Experimental | ||
| @Since("1.6.0") | ||
| def toPMML(sc: SparkContext, path: String): Unit = { | ||
| val pmml = toPMML() | ||
| sc.parallelize(Array(pmml), 1).saveAsTextFile(path) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * Export the model to the OutputStream in PMML format | ||
| */ | ||
| @Experimental | ||
| @Since("1.6.0") | ||
| def toPMML(outputStream: OutputStream): Unit = { | ||
| toPMML(new StreamResult(outputStream)) | ||
| } | ||
|
|
||
| /** | ||
| * :: Experimental :: | ||
| * Export the model to a String in PMML format | ||
| */ | ||
| @Experimental | ||
| @Since("1.6.0") | ||
| def toPMML(): String = { | ||
| val writer = new StringWriter | ||
| toPMML(new StreamResult(writer)) | ||
| writer.toString | ||
| } | ||
|
|
||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
mllib/src/test/scala/org/apache/spark/ml/util/PMMLUtils.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.spark.ml.util | ||
|
|
||
| import java.io.StringReader | ||
| import javax.xml.bind.Unmarshaller | ||
| import javax.xml.transform.Source | ||
|
|
||
| import org.dmg.pmml._ | ||
| import org.jpmml.model.{ImportFilter, JAXBUtil} | ||
| import org.xml.sax.InputSource | ||
|
|
||
| /** | ||
| * Testing utils for working with PMML. | ||
| * Predictive Model Markup Language (PMML) is an XML-based file format | ||
| * developed by the Data Mining Group (www.dmg.org). | ||
| */ | ||
| private[spark] object PMMLUtils { | ||
| /** | ||
| * :: Experimental :: | ||
| * Load a PMML model from a string. Note: for testing only, PMML model evaluation is supported | ||
| * through external spark-packages. | ||
| */ | ||
| def loadFromString(input: String): PMML = { | ||
| val is = new StringReader(input) | ||
| val transformed = ImportFilter.apply(new InputSource(is)) | ||
| JAXBUtil.unmarshalPMML(transformed) | ||
| } | ||
| } |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is a copy-paste of org.apache.spark.mllib.pmml, should we deprecate the mllib one, and use the new one in ml package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might be good, the main difference is this avoids using the factory implementation that the MLLib API was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Uses the same public facing API as per the JIRA discussion re: lack of complaints from users with old API)