[SPARK-8492] [SQL] support binaryType in UnsafeRow #6911

davies · 2015-06-19T23:49:56Z

Support BinaryType in UnsafeRow, just like StringType.

Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).

davies · 2015-06-19T23:50:42Z

cc @JoshRosen

SparkQA · 2015-06-20T00:00:33Z

Test build #35339 has finished for PR 6911 at commit 447dea0.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-20T03:14:36Z

Test build #35346 has finished for PR 6911 at commit 6abfe93.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-06-20T06:52:23Z

The 2GB row limit isn't an issue since we already implicitly have that limit in UnsafeRowConverter.

SparkQA · 2015-06-20T07:33:54Z

Test build #35357 has finished for PR 6911 at commit 22e4c0a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-20T08:55:29Z

Test build #35359 has finished for PR 6911 at commit 180b49d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-06-20T09:39:16Z

Test build #943 has finished for PR 6911 at commit 180b49d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- // class ParentClass(parentField: Int)
- // class ChildClass(childField: Int) extends ParentClass(1)
- // If the class type corresponding to current slot has writeObject() defined,
- // then its not obvious which fields of the class will be serialized as the writeObject()
- class SerializableConfiguration(@transient var value: Configuration) extends Serializable
- class SerializableJobConf(@transient var value: JobConf) extends Serializable
- class StreamingKMeansModel(KMeansModel):
- class StreamingKMeans(object):
- abstract class GeneratedClass
- case class Bin(child: Expression)
- case class Md5(child: Expression)

SparkQA · 2015-06-21T00:59:07Z

Test build #947 has finished for PR 6911 at commit 180b49d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2015-06-22T19:05:53Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverter.scala

I guess the precedence is kind of obvious from usage / context, but it wouldn't hurt to add parens to disambiguate the order in which the shifts are applied.

JoshRosen · 2015-06-22T19:19:55Z

This looks good to me overall. I like the idea of storing the length in the fixed-length values section alongside the pointer to the variable-length data. I wonder whether there's a natural point to document / explicitly call out this encoding, though, in order to make it a bit more obvious to any new readers of this file.

JoshRosen · 2015-06-22T19:21:26Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

Do we need to mask out the upper 32 bits before converting to a long? I guess the uppermost bit probably can't be 1 because the offset can't be negative, so I guess we don't need to worry about sign-extension during the shift.

Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

SparkQA · 2015-06-22T22:03:13Z

Test build #35479 has finished for PR 6911 at commit 519f698.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class UnresolvedAlias(child: Expression) extends NamedExpression
- abstract class ExtractValueWithStruct extends ExtractValue

SparkQA · 2015-06-22T22:12:06Z

Test build #35480 has finished for PR 6911 at commit d68706f.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class UnresolvedAlias(child: Expression) extends NamedExpression
- abstract class ExtractValueWithStruct extends ExtractValue

davies · 2015-06-22T22:23:02Z

Merged into master

Support BinaryType in UnsafeRow, just like StringType. Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM). Author: Davies Liu <[email protected]> Closes apache#6911 from davies/unsafe_bin and squashes the following commits: d68706f [Davies Liu] update comment 519f698 [Davies Liu] address comment 98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin 180b49d [Davies Liu] fix zero-out 22e4c0a [Davies Liu] zero-out padding bytes 6abfe93 [Davies Liu] fix style 447dea0 [Davies Liu] support binaryType in UnsafeRow

support binaryType in UnsafeRow

447dea0

fix style

6abfe93

zero-out padding bytes

22e4c0a

fix zero-out

180b49d

JoshRosen reviewed Jun 22, 2015
View reviewed changes

Davies Liu added 3 commits June 22, 2015 12:50

Merge branch 'master' of github.com:apache/spark into unsafe_bin

98a964b

Conflicts: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java

address comment

519f698

update comment

d68706f

asfgit closed this in 96aa013 Jun 22, 2015

[SPARK-8492] [SQL] support binaryType in UnsafeRow #6911

[SPARK-8492] [SQL] support binaryType in UnsafeRow #6911

Uh oh!

Conversation

davies commented Jun 19, 2015

Uh oh!

davies commented Jun 19, 2015

Uh oh!

SparkQA commented Jun 20, 2015

Uh oh!

SparkQA commented Jun 20, 2015

Uh oh!

JoshRosen commented Jun 20, 2015

Uh oh!

SparkQA commented Jun 20, 2015

Uh oh!

SparkQA commented Jun 20, 2015

Uh oh!

SparkQA commented Jun 20, 2015

Uh oh!

SparkQA commented Jun 21, 2015

Uh oh!

JoshRosen Jun 22, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen commented Jun 22, 2015

Uh oh!

JoshRosen Jun 22, 2015

Choose a reason for hiding this comment

Uh oh!

davies Jun 22, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 22, 2015

Uh oh!

SparkQA commented Jun 22, 2015

Uh oh!

davies commented Jun 22, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants