-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8492] [SQL] support binaryType in UnsafeRow #6911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @JoshRosen |
|
Test build #35339 has finished for PR 6911 at commit
|
|
Test build #35346 has finished for PR 6911 at commit
|
|
The 2GB row limit isn't an issue since we already implicitly have that limit in |
|
Test build #35357 has finished for PR 6911 at commit
|
|
Test build #35359 has finished for PR 6911 at commit
|
|
Test build #943 has finished for PR 6911 at commit
|
|
Test build #947 has finished for PR 6911 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the precedence is kind of obvious from usage / context, but it wouldn't hurt to add parens to disambiguate the order in which the shifts are applied.
|
This looks good to me overall. I like the idea of storing the length in the fixed-length values section alongside the pointer to the variable-length data. I wonder whether there's a natural point to document / explicitly call out this encoding, though, in order to make it a bit more obvious to any new readers of this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to mask out the upper 32 bits before converting to a long? I guess the uppermost bit probably can't be 1 because the offset can't be negative, so I guess we don't need to worry about sign-extension during the shift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
|
Test build #35479 has finished for PR 6911 at commit
|
|
Test build #35480 has finished for PR 6911 at commit
|
|
Merged into master |
Support BinaryType in UnsafeRow, just like StringType. Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM). Author: Davies Liu <[email protected]> Closes apache#6911 from davies/unsafe_bin and squashes the following commits: d68706f [Davies Liu] update comment 519f698 [Davies Liu] address comment 98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin 180b49d [Davies Liu] fix zero-out 22e4c0a [Davies Liu] zero-out padding bytes 6abfe93 [Davies Liu] fix style 447dea0 [Davies Liu] support binaryType in UnsafeRow
Support BinaryType in UnsafeRow, just like StringType.
Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).