-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14487][SQL] User Defined Type registration without SQLUserDefinedType annotation #12259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @dbtsai You might be interested in this. |
|
Test build #55341 has finished for PR 12259 at commit
|
|
@viirya we don't actually want to delete |
|
@MLnick I see. I forget to upload another scala file. I will do it once I am back to laptop. |
|
I separate |
|
I was thinking that we can re-register them in an object. However, as Scala's object is lazy initialized, we can't run registration before user classes are used. So for these built-in UDTs, we are only allowed to use existing annotation of UDT. But this registration is still useful because you can manually register external user classes without adding Spark dependency and annotation on it. |
|
Sure - my question is when and where should that happen? i.e. it should happen before they are used of course. So do we create a hook somewhere (perhaps in SQLContext init? elsewhere?) |
92968a1 to
b3acdb9
Compare
|
hmm, is it better to make |
|
Test build #55345 has finished for PR 12259 at commit
|
|
Test build #55347 has finished for PR 12259 at commit
|
|
Test build #55351 has finished for PR 12259 at commit
|
|
Test build #55407 has finished for PR 12259 at commit
|
|
Test build #55503 has finished for PR 12259 at commit
|
|
This is a good idea, but in 2.0 we are going to hide UserDefinedFunction for now and re-design this API in 2.1. We should revisit it together. |
|
+cc @mengxr @rxin In SPARK-13944, the @viirya Can we rebase and review this code once the Thanks. |
|
@dbtsai yes. that will be good. |
… in mllib-local ## What changes were proposed in this pull request? This task will copy the Vector and Matrix classes from mllib to ml package in mllib-local jar. The UDTs and `since` annotation in ml vector and matrix will be removed from now. UDTs will be achieved by #SPARK-14487, and `since` will be replaced by /* since 1.2.0 */ The BLAS implementation will be copied, and some of the test utilities will be copies as well. Summary of changes: 1. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/BLAS.scala - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/BLAS.scala - logDebug("gemm: alpha is equal to 0 and beta is equal to 1. Returning C.") is removed in ml version. 2. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Matrices.scala - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Matrices.scala - `Since` was removed, and we'll use standard `/* Since /*` Java doc. Will be in another PR. - `UDT` related code was removed, and will use `SPARK-13944` #12259 to replace the annotation. 3. In mllib-local/src/main/scala/org/apache/spark/**ml**/linalg/Vectors.scala - Copied from mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala - `Since` was removed. - `UDT` related code was removed. - In `def parseNumeric`, it was throwing `throw new SparkException(s"Cannot parse $other.")`, and now it's throwing `throw new IllegalArgumentException(s"Cannot parse $other.")` 4. In mllib/src/main/scala/org/apache/spark/**mllib**/linalg/Vectors.scala - For consistency with ML version of vector, `def parseNumeric` is now throwing `throw new IllegalArgumentException(s"Cannot parse $other.")` 5. mllib/src/main/scala/org/apache/spark/**mllib**/util/NumericParser.scala is moved to mllib-local/src/main/scala/org/apache/spark/**ml**/util/NumericParser.scala - All the `throw new SparkException` were replaced by `throw new IllegalArgumentException` ## How was this patch tested? unit tests Author: DB Tsai <[email protected]> Closes #12317 from dbtsai/dbtsai-ml-vector.
|
Test build #56826 has finished for PR 12259 at commit
|
|
Test build #56828 has finished for PR 12259 at commit
|
|
retest this please. |
|
Test build #56831 has finished for PR 12259 at commit
|
|
I think |
|
@dbtsai yea. Forget to do that. Updating now. |
|
Created a JIRA https://issues.apache.org/jira/browse/SPARK-14906 to follow this. |
|
Test build #56946 has finished for PR 12259 at commit
|
| import org.apache.spark.sql.types._ | ||
|
|
||
| @BeanInfo | ||
| private[ml] case class MyVectorPoint( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on unit testing
|
Made another pass. Please also address my previous comment on |
|
@mengxr Thanks for these comments. I simplified |
|
Test build #57207 has finished for PR 12259 at commit
|
| * | ||
| * @since 1.6.0 | ||
| */ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: remove empty line
|
LGTM pending Jenkins |
|
Test build #57217 has finished for PR 12259 at commit
|
|
Test build #57219 has finished for PR 12259 at commit
|
|
Merged into master. Thanks! |
| object UDTRegistration extends Serializable with Logging { | ||
|
|
||
| /** The mapping between the Class between UserDefinedType and user classes. */ | ||
| private lazy val udtMap: mutable.Map[String, String] = mutable.Map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would new entry be added to this map for user UDTs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, user UDTs can be registered in this map. However, as this is just private api, we may expect to refactor it in 2.1 then have it public.
What changes were proposed in this pull request?
Currently we use
SQLUserDefinedTypeannotation to register UDTs for user classes. However, by doing this, we add Spark dependency to user classes.For some user classes, it is unnecessary to add such dependency that will increase deployment difficulty.
We should provide alternative approach to register UDTs for user classes without
SQLUserDefinedTypeannotation.How was this patch tested?
UserDefinedTypeSuite