Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions LICENSE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,9 @@ javax.xml.bind:jaxb-api https://github.com/javaee/jaxb-v2
Eclipse Distribution License (EDL) 1.0
--------------------------------------
com.sun.istack:istack-commons-runtime
jakarta.activation:jakarta.activation-api
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image

jakarta.xml.bind:jakarta.xml.bind-api
org.glassfish.jaxb:jaxb-core
org.glassfish.jaxb:jaxb-runtime

Eclipse Public License (EPL) 2.0
Expand Down
9 changes: 5 additions & 4 deletions dev/deps/spark-deps-hadoop-3-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ httpclient/4.5.14//httpclient-4.5.14.jar
httpcore/4.4.16//httpcore-4.4.16.jar
icu4j/76.1//icu4j-76.1.jar
ini4j/0.5.4//ini4j-0.5.4.jar
istack-commons-runtime/3.0.8//istack-commons-runtime-3.0.8.jar
istack-commons-runtime/4.1.2//istack-commons-runtime-4.1.2.jar
ivy/2.5.3//ivy-2.5.3.jar
j2objc-annotations/3.0.0//j2objc-annotations-3.0.0.jar
jackson-annotations/2.18.2//jackson-annotations-2.18.2.jar
Expand All @@ -113,21 +113,22 @@ jackson-dataformat-yaml/2.18.2//jackson-dataformat-yaml-2.18.2.jar
jackson-datatype-jsr310/2.18.2//jackson-datatype-jsr310-2.18.2.jar
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
jackson-module-scala_2.13/2.18.2//jackson-module-scala_2.13-2.18.2.jar
jakarta.activation-api/2.1.3//jakarta.activation-api-2.1.3.jar
jakarta.annotation-api/2.1.1//jakarta.annotation-api-2.1.1.jar
jakarta.inject-api/2.0.1//jakarta.inject-api-2.0.1.jar
jakarta.servlet-api/5.0.0//jakarta.servlet-api-5.0.0.jar
jakarta.validation-api/3.0.2//jakarta.validation-api-3.0.2.jar
jakarta.ws.rs-api/3.0.0//jakarta.ws.rs-api-3.0.0.jar
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
jakarta.xml.bind-api/4.0.2//jakarta.xml.bind-api-4.0.2.jar
janino/3.1.9//janino-3.1.9.jar
java-diff-utils/4.15//java-diff-utils-4.15.jar
java-xmlbuilder/1.2//java-xmlbuilder-1.2.jar
javassist/3.30.2-GA//javassist-3.30.2-GA.jar
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
javax.servlet-api/4.0.1//javax.servlet-api-4.0.1.jar
javolution/5.5.1//javolution-5.5.1.jar
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
jaxb-core/4.0.5//jaxb-core-4.0.5.jar
jaxb-runtime/4.0.5//jaxb-runtime-4.0.5.jar
jcl-over-slf4j/2.0.16//jcl-over-slf4j-2.0.16.jar
jdo-api/3.0.1//jdo-api-3.0.1.jar
jdom2/2.0.6//jdom2-2.0.6.jar
Expand Down
19 changes: 19 additions & 0 deletions docs/ml-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,25 @@ Note that this migration guide describes the items specific to MLlib.
Many items of SQL migration can be applied when migrating MLlib to higher versions for DataFrame-based APIs.
Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.html).

## Upgrading from MLlib 3.5 to 4.0

### Breaking changes
{:.no_toc}

There are no breaking changes.

### Deprecations and changes of behavior
{:.no_toc}

**Deprecations**

There are no deprecations.

**Changes of behavior**

* [SPARK-51132](https://issues.apache.org/jira/browse/SPARK-51132):
The PMML XML schema version of exported PMML format models by [PMML model export](mllib-pmml-model-export.html) has been upgraded from `PMML-4_3` to `PMML-4_4`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's just one question: What's the difference between PMML-4_3 and PMML-4_4? And are we sure this won't introduce any breaking changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between PMML-4_3 and PMML-4_4?

Since you didn't change anything about your application code, then the only perceivable difference will be a new XML namespace declaration on the first line of the exported PMML documents.

Previously, the top-level PMML element was <PMML xmlns="http://www.dmg.org/PMML-4_3">, now it's <PMML xmlns="http://www.dmg.org/PMML-4_4">.

And are we sure this won't introduce any breaking changes?

The JPMML-Model library defaults to the latest PMML schema version in its output (ie. 4.4).

If you really want to, you can keep outputting PMML 4.3 schema version documents by "filtering" the output stream using the org.jpmml.model.PMMLOutputStream class:

JAXBSerializer jaxbSerializer = new JAXBSerializer();

OutputStream os = ...

try(OutputStream os = new PMMLOutputStream(os, Version.PMML_4_3)){
	jaxbSerializer.serializePretty(pmml, os);
}


## Upgrading from MLlib 2.4 to 3.0

### Breaking changes
Expand Down
8 changes: 4 additions & 4 deletions mllib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,6 @@
<url>https://spark.apache.org/</url>

<dependencies>
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</dependency>
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-parser-combinators_${scala.binary.version}</artifactId>
Expand Down Expand Up @@ -144,6 +140,10 @@
<groupId>org.glassfish.jaxb</groupId>
<artifactId>jaxb-runtime</artifactId>
</dependency>
<dependency>
<groupId>jakarta.xml.bind</groupId>
<artifactId>jakarta.xml.bind-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ package org.apache.spark.mllib.pmml
import java.io.{File, OutputStream, StringWriter}
import javax.xml.transform.stream.StreamResult

import org.jpmml.model.JAXBUtil
import org.jpmml.model.JAXBSerializer

import org.apache.spark.SparkContext
import org.apache.spark.annotation.Since
Expand All @@ -39,7 +39,8 @@ trait PMMLExportable {
*/
private def toPMML(streamResult: StreamResult): Unit = {
val pmmlModelExport = PMMLModelExportFactory.createPMMLModelExport(this)
JAXBUtil.marshalPMML(pmmlModelExport.getPmml(), streamResult)
val jaxbSerializer = new JAXBSerializer()
jaxbSerializer.marshalPretty(pmmlModelExport.getPmml(), streamResult)
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ package org.apache.spark.mllib.pmml.`export`

import scala.{Array => SArray}

import org.dmg.pmml.{DataDictionary, DataField, DataType, FieldName, MiningField,
MiningFunction, MiningSchema, OpType}
import org.dmg.pmml.{DataDictionary, DataField, DataType, MiningField, MiningFunction,
MiningSchema, OpType}
import org.dmg.pmml.regression.{NumericPredictor, RegressionModel, RegressionTable}

import org.apache.spark.mllib.regression.GeneralizedLinearModel
Expand All @@ -44,7 +44,7 @@ private[mllib] class BinaryClassificationPMMLModelExport(
pmml.getHeader.setDescription(description)

if (model.weights.size > 0) {
val fields = new SArray[FieldName](model.weights.size)
val fields = new SArray[String](model.weights.size)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes come from: jpmml/jpmml-model@5969dc2

val dataDictionary = new DataDictionary
val miningSchema = new MiningSchema
val regressionTableYES = new RegressionTable(model.intercept).setTargetCategory("1")
Expand All @@ -67,7 +67,7 @@ private[mllib] class BinaryClassificationPMMLModelExport(
.addRegressionTables(regressionTableYES, regressionTableNO)

for (i <- 0 until model.weights.size) {
fields(i) = FieldName.create("field_" + i)
fields(i) = "field_" + i
dataDictionary.addDataFields(new DataField(fields(i), OpType.CONTINUOUS, DataType.DOUBLE))
miningSchema
.addMiningFields(new MiningField(fields(i))
Expand All @@ -76,7 +76,7 @@ private[mllib] class BinaryClassificationPMMLModelExport(
}

// add target field
val targetField = FieldName.create("target")
val targetField = "target"
dataDictionary
.addDataFields(new DataField(targetField, OpType.CATEGORICAL, DataType.STRING))
miningSchema
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ package org.apache.spark.mllib.pmml.`export`

import scala.{Array => SArray}

import org.dmg.pmml.{DataDictionary, DataField, DataType, FieldName, MiningField,
MiningFunction, MiningSchema, OpType}
import org.dmg.pmml.{DataDictionary, DataField, DataType, MiningField, MiningFunction,
MiningSchema, OpType}
import org.dmg.pmml.regression.{NumericPredictor, RegressionModel, RegressionTable}

import org.apache.spark.mllib.regression.GeneralizedLinearModel
Expand All @@ -42,7 +42,7 @@ private[mllib] class GeneralizedLinearPMMLModelExport(
pmml.getHeader.setDescription(description)

if (model.weights.size > 0) {
val fields = new SArray[FieldName](model.weights.size)
val fields = new SArray[String](model.weights.size)
val dataDictionary = new DataDictionary
val miningSchema = new MiningSchema
val regressionTable = new RegressionTable(model.intercept)
Expand All @@ -53,7 +53,7 @@ private[mllib] class GeneralizedLinearPMMLModelExport(
.addRegressionTables(regressionTable)

for (i <- 0 until model.weights.size) {
fields(i) = FieldName.create("field_" + i)
fields(i) = "field_" + i
dataDictionary.addDataFields(new DataField(fields(i), OpType.CONTINUOUS, DataType.DOUBLE))
miningSchema
.addMiningFields(new MiningField(fields(i))
Expand All @@ -62,7 +62,7 @@ private[mllib] class GeneralizedLinearPMMLModelExport(
}

// for completeness add target field
val targetField = FieldName.create("target")
val targetField = "target"
dataDictionary.addDataFields(new DataField(targetField, OpType.CONTINUOUS, DataType.DOUBLE))
miningSchema
.addMiningFields(new MiningField(targetField)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ package org.apache.spark.mllib.pmml.`export`
import scala.{Array => SArray}

import org.dmg.pmml.{Array, CompareFunction, ComparisonMeasure, DataDictionary, DataField, DataType,
FieldName, MiningField, MiningFunction, MiningSchema, OpType, SquaredEuclidean}
MiningField, MiningFunction, MiningSchema, OpType, SquaredEuclidean}
import org.dmg.pmml.clustering.{Cluster, ClusteringField, ClusteringModel}

import org.apache.spark.mllib.clustering.KMeansModel
Expand All @@ -40,7 +40,7 @@ private[mllib] class KMeansPMMLModelExport(model: KMeansModel) extends PMMLModel

if (model.clusterCenters.length > 0) {
val clusterCenter = model.clusterCenters(0)
val fields = new SArray[FieldName](clusterCenter.size)
val fields = new SArray[String](clusterCenter.size)
val dataDictionary = new DataDictionary
val miningSchema = new MiningSchema
val comparisonMeasure = new ComparisonMeasure()
Expand All @@ -55,7 +55,7 @@ private[mllib] class KMeansPMMLModelExport(model: KMeansModel) extends PMMLModel
.setNumberOfClusters(model.clusterCenters.length)

for (i <- 0 until clusterCenter.size) {
fields(i) = FieldName.create("field_" + i)
fields(i) = "field_" + i
dataDictionary.addDataFields(new DataField(fields(i), OpType.CONTINUOUS, DataType.DOUBLE))
miningSchema
.addMiningFields(new MiningField(fields(i))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import java.util.Locale

import scala.beans.BeanProperty

import org.dmg.pmml.{Application, Header, PMML, Timestamp}
import org.dmg.pmml.{Application, Header, PMML, Timestamp, Version}

private[mllib] trait PMMLModelExport {

Expand All @@ -44,6 +44,6 @@ private[mllib] trait PMMLModelExport {
val header = new Header()
.setApplication(app)
.setTimestamp(timestamp)
new PMML("4.2", header, null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, in JPMML 1.4.7 version, the PMML standard was updated to 4.3, so it should be consistent and changed to 4.3 at that time

new PMML(Version.PMML_4_4.getVersion(), header, null)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1165,7 +1165,7 @@ class LinearRegressionSuite extends MLTest with DefaultReadWriteTest with PMMLRe
assert(fields(0).getOpType() == OpType.CONTINUOUS)
val pmmlRegressionModel = pmml.getModels().get(0).asInstanceOf[PMMLRegressionModel]
val pmmlPredictors = pmmlRegressionModel.getRegressionTables.get(0).getNumericPredictors
val pmmlWeights = pmmlPredictors.asScala.map(_.getCoefficient()).toList
val pmmlWeights = pmmlPredictors.asScala.map(_.getCoefficient().doubleValue()).toList
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes come from: jpmml/jpmml-model@6d356fe

assert(pmmlWeights(0) ~== model.coefficients(0) relTol 1E-3)
assert(pmmlWeights(1) ~== model.coefficients(1) relTol 1E-3)
}
Expand Down
5 changes: 3 additions & 2 deletions mllib/src/test/scala/org/apache/spark/ml/util/PMMLUtils.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ import java.io.ByteArrayInputStream
import java.nio.charset.StandardCharsets

import org.dmg.pmml.PMML
import org.jpmml.model.{JAXBUtil, SAXUtil}
import org.jpmml.model.{JAXBSerializer, SAXUtil}
import org.jpmml.model.filters.ImportFilter

/**
Expand All @@ -37,6 +37,7 @@ private[spark] object PMMLUtils {
val transformed = SAXUtil.createFilteredSource(
new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8)),
new ImportFilter())
JAXBUtil.unmarshalPMML(transformed)
val jaxbSerializer = new JAXBSerializer()
jaxbSerializer.unmarshal(transformed).asInstanceOf[PMML]
}
}
33 changes: 9 additions & 24 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,7 @@
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-model</artifactId>
<version>1.4.8</version>
<version>1.7.1</version>
<scope>provided</scope>
<exclusions>
<exclusion>
Expand Down Expand Up @@ -599,32 +599,24 @@
<dependency>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>jaxb-runtime</artifactId>
<version>2.3.2</version>
<version>4.0.5</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there version contract between this dep and other org.glassfish.* deps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no mutual dependence between the org.glassfish.jersey series (3.0.16) and the org.glassfish.jaxb (4.0.5) series.

But they both depend on jakarta.xml.bind:jakarta.xml.bind-api and jakarta.activation:jakarta.activation-api, and the dependency versions of jaxb are newer (xml: 4.0.2 vs 3.0.1, activation: 2.1.3 vs 2.0.1).

<scope>compile</scope>
<exclusions>
<!-- for now, we only write XML in PMML export, and these can be excluded -->
<exclusion>
<groupId>com.sun.xml.fastinfoset</groupId>
<artifactId>FastInfoset</artifactId>
</exclusion>
<exclusion>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>txw2</artifactId>
</exclusion>
<exclusion>
<groupId>org.jvnet.staxex</groupId>
<artifactId>stax-ex</artifactId>
</exclusion>
<!--
SPARK-27611: Exclude redundant javax.activation implementation, which
conflicts with the existing javax.activation:activation:1.1.1 dependency.
-->
<exclusion>
<groupId>jakarta.activation</groupId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current jaxb-runtime version, we need jakarta.activation, otherwise there will be java.lang.NoClassDefFoundError: jakarta/activation/DataSource errors, both when exporting PMML models and when using jersey.

Copy link
Member

@pan3793 pan3793 Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some exclusions are invalid now, please remove them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found that the jakarta.xml.bind:jakarta.xml.bind-api exclusion in jersey-server can be removed, but the com.sun.activation:jakarta.activation exclusion in jersey-common can still be retained, because this dependence is currently useless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean com.sun.xml.fastinfoset:FastInfoset and org.jvnet.staxex:stax-ex, they are marked as optional in org.glassfish.jaxb:jaxb-runtime:4.0.5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, I updated it

<artifactId>jakarta.activation-api</artifactId>
<groupId>org.eclipse.angus</groupId>
<artifactId>angus-activation</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jakarta.xml.bind</groupId>
<artifactId>jakarta.xml.bind-api</artifactId>
<version>4.0.2</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
Expand Down Expand Up @@ -1061,13 +1053,6 @@
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-server</artifactId>
<version>${jersey.version}</version>
<!-- SPARK-28765 Unused JDK11-specific dependency -->
<exclusions>
<exclusion>
<groupId>jakarta.xml.bind</groupId>
<artifactId>jakarta.xml.bind-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.glassfish.jersey.core</groupId>
Expand Down