Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization for PCA model seems to be broken. Failed to find a default value for "k" #831

Open
tansinghal12 opened this issue Oct 28, 2022 · 2 comments

Comments

@tansinghal12
Copy link

I have created a PCA model and able to serializeToBundle without any issues. However, after deserializeFromBundle and trying to transform, I am encountering below error:

py4j.protocol.Py4JJavaError: An error occurred while calling o278.transform.
: java.util.NoSuchElementException: Failed to find a default value for k
at org.apache.spark.ml.param.Params.$anonfun$getOrDefault$2(params.scala:756)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params.getOrDefault(params.scala:756)
at org.apache.spark.ml.param.Params.getOrDefault$(params.scala:753)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:41)
at org.apache.spark.ml.param.Params.$(params.scala:762)
at org.apache.spark.ml.param.Params.$$(params.scala:762)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:41)
at org.apache.spark.ml.feature.PCAParams.validateAndTransformSchema(PCA.scala:55)
at org.apache.spark.ml.feature.PCAParams.validateAndTransformSchema$(PCA.scala:51)
at org.apache.spark.ml.feature.PCAModel.validateAndTransformSchema(PCA.scala:122)
at org.apache.spark.ml.feature.PCAModel.transformSchema(PCA.scala:156)
at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:71)
at org.apache.spark.ml.feature.PCAModel.transform(PCA.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)

However, I have passed value of k while initialization the model and serializing it.

model = PCAModel(k=3, inputCol="features", outputCol="pca")

It seems to be very similar to the issue: #481 where certain variables are missed to be set.

@jsleight
Copy link
Contributor

Can you tell us a bit about your environment. What mleap and spark version do you have?

@jsleight
Copy link
Contributor

Well, I can see that we aren't storing k in spark.ml.bundle.ops.feature.PcaOp, so storing then loading back into Spark will crash like you observed. Looking through changelogs on the files I think it has always been this way 😢 . Should be an easy fix here https://github.com/combust/mleap/blob/master/mleap-spark/src/main/scala/org/apache/spark/ml/bundle/ops/feature/PcaOp.scala if you feel up to making a PR?

Stepping back a bit you're doing something a bit unexpected (and which is not tested in mleap) by trying to serialize then deserialize back into spark. Normally for that code flow people just use Spark's built in serialization/deserialization capabilities. MLeap does test parity between spark->mleap via doing:

  • transform in spark
  • serialize to bundle
  • deserialize to mleap
  • transform in mleap

Testing to deserialize back to spark is a good idea but I'm not really sure how many other transformers will also have missing values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants