-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17368] [SQL] Add support for value class serialization and deserialization #15284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Value classes were unsupported because catalyst data types were obtained through reflection on erased types, which would resolve to a value class' wrapped type and hence lead to unavailable methods during code generation. This patch simply removes the erasure step when getting data types for catalyst.
| mapFieldNull: scala.collection.Map[Int, java.lang.Long], | ||
| structField: PrimitiveData) | ||
|
|
||
| case class SpecificCollection(l: List[Int]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup: found this unused class whilst going over the code
|
Test build #66060 has finished for PR 15284 at commit
|
| * Retrieves the runtime class corresponding to the provided type. | ||
| */ | ||
| def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.erasure.typeSymbol.asClass) | ||
| def getClassFromType(tpe: Type): Class[_] = mirror.runtimeClass(tpe.typeSymbol.asClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the main change
|
cc @marmbrus for the changes to scala reflection |
|
LGTM, merging to master. |
…rialization ## What changes were proposed in this pull request? Value classes were unsupported because catalyst data types were obtained through reflection on erased types, which would resolve to a value class' wrapped type and hence lead to unavailable methods during code generation. E.g. the following class ```scala case class Foo(x: Int) extends AnyVal ``` would be seen as an `int` in catalyst and will cause instance cast failures when generated java code tries to treat it as a `Foo`. This patch simply removes the erasure step when getting data types for catalyst. ## How was this patch tested? Additional tests in `ExpressionEncoderSuite`. Author: Jakob Odersky <[email protected]> Closes apache#15284 from jodersky/value-classes.
…rialization ## What changes were proposed in this pull request? Value classes were unsupported because catalyst data types were obtained through reflection on erased types, which would resolve to a value class' wrapped type and hence lead to unavailable methods during code generation. E.g. the following class ```scala case class Foo(x: Int) extends AnyVal ``` would be seen as an `int` in catalyst and will cause instance cast failures when generated java code tries to treat it as a `Foo`. This patch simply removes the erasure step when getting data types for catalyst. ## How was this patch tested? Additional tests in `ExpressionEncoderSuite`. Author: Jakob Odersky <[email protected]> Closes apache#15284 from jodersky/value-classes.
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes #27959 from Ngone51/impr_anyval. Authored-by: yi.wu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR #15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes #27959 from Ngone51/impr_anyval. Authored-by: yi.wu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 5c4d44b) Signed-off-by: Wenchen Fan <[email protected]>
…yVal type ### What changes were proposed in this pull request? Improve `ScalaReflection` to only don't erasure non user defined `AnyVal` type, but still erasure other types, e.g. `Any`. And this brings two benefits: 1. Give better encode error message for some unsupported types, e.g. `Any` 2. Won't miss the walk path for the `AnyVal` type ### Why are the changes needed? Firstly, PR apache#15284 added encode(serializeFor/deserializeFor) support for value class, which extends `AnyVal`, by not erasure types. But, this also introduce a problem that when user try to encoder unsupported types, e.g. `Any`, it will fail on `java.lang.ClassNotFoundException: scala.Any` due to the reason that `scala.Any` doesn't erasure to `java.lang.Object`. Also, in current `getClassNameFromType()`, it always erasure types which could missing walked path for user defined `AnyVal` types. ### Does this PR introduce any user-facing change? Yes. For the test below: ``` case class Bar(i: Any) case class Foo(i: Bar) extends AnyVal test() { implicitly[ExpressionEncoder[Foo]] } ``` Before: ``` java.lang.ClassNotFoundException: scala.Any at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) ... ```` After: ``` java.lang.UnsupportedOperationException: No Encoder found for Any - field (class: "java.lang.Object", name: "i") - field (class: "org.apache.spark.sql.catalyst.encoders.Bar", name: "i") - root class: "org.apache.spark.sql.catalyst.encoders.Foo" at org.apache.spark.sql.catalyst.ScalaReflection$.$anonfun$serializerFor$1(ScalaReflection.scala:561) ``` ### How was this patch tested? Added unit test and test manually. Closes apache#27959 from Ngone51/impr_anyval. Authored-by: yi.wu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Value classes were unsupported because catalyst data types were
obtained through reflection on erased types, which would resolve to a
value class' wrapped type and hence lead to unavailable methods during
code generation.
E.g. the following class
would be seen as an
intin catalyst and will cause instance cast failures when generated java code tries to treat it as aFoo.This patch simply removes the erasure step when getting data types for
catalyst.
How was this patch tested?
Additional tests in
ExpressionEncoderSuite.