Skip to content

Conversation

@davies
Copy link
Contributor

@davies davies commented Jun 19, 2015

Support BinaryType in UnsafeRow, just like StringType.

Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).

@davies
Copy link
Contributor Author

davies commented Jun 19, 2015

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35339 has finished for PR 6911 at commit 447dea0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35346 has finished for PR 6911 at commit 6abfe93.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

The 2GB row limit isn't an issue since we already implicitly have that limit in UnsafeRowConverter.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35357 has finished for PR 6911 at commit 22e4c0a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35359 has finished for PR 6911 at commit 180b49d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #943 has finished for PR 6911 at commit 180b49d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // class ParentClass(parentField: Int)
    • // class ChildClass(childField: Int) extends ParentClass(1)
    • // If the class type corresponding to current slot has writeObject() defined,
    • // then its not obvious which fields of the class will be serialized as the writeObject()
    • class SerializableConfiguration(@transient var value: Configuration) extends Serializable
    • class SerializableJobConf(@transient var value: JobConf) extends Serializable
    • class StreamingKMeansModel(KMeansModel):
    • class StreamingKMeans(object):
    • abstract class GeneratedClass
    • case class Bin(child: Expression)
    • case class Md5(child: Expression)

@SparkQA
Copy link

SparkQA commented Jun 21, 2015

Test build #947 has finished for PR 6911 at commit 180b49d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the precedence is kind of obvious from usage / context, but it wouldn't hurt to add parens to disambiguate the order in which the shifts are applied.

@JoshRosen
Copy link
Contributor

This looks good to me overall. I like the idea of storing the length in the fixed-length values section alongside the pointer to the variable-length data. I wonder whether there's a natural point to document / explicitly call out this encoding, though, in order to make it a bit more obvious to any new readers of this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to mask out the upper 32 bits before converting to a long? I guess the uppermost bit probably can't be 1 because the offset can't be negative, so I guess we don't need to worry about sign-extension during the shift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

Davies Liu added 3 commits June 22, 2015 12:50
Conflicts:
	sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35479 has finished for PR 6911 at commit 519f698.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAlias(child: Expression) extends NamedExpression
    • abstract class ExtractValueWithStruct extends ExtractValue

@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35480 has finished for PR 6911 at commit d68706f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class UnresolvedAlias(child: Expression) extends NamedExpression
    • abstract class ExtractValueWithStruct extends ExtractValue

@davies
Copy link
Contributor Author

davies commented Jun 22, 2015

Merged into master

@asfgit asfgit closed this in 96aa013 Jun 22, 2015
animeshbaranawal pushed a commit to animeshbaranawal/spark that referenced this pull request Jun 25, 2015
Support BinaryType in UnsafeRow, just like StringType.

Also change the layout of StringType and BinaryType in UnsafeRow, by combining offset and size together as Long, which will limit the size of Row to under 2G (given that fact that any single buffer can not be bigger than 2G in JVM).

Author: Davies Liu <[email protected]>

Closes apache#6911 from davies/unsafe_bin and squashes the following commits:

d68706f [Davies Liu] update comment
519f698 [Davies Liu] address comment
98a964b [Davies Liu] Merge branch 'master' of github.com:apache/spark into unsafe_bin
180b49d [Davies Liu] fix zero-out
22e4c0a [Davies Liu] zero-out padding bytes
6abfe93 [Davies Liu] fix style
447dea0 [Davies Liu] support binaryType in UnsafeRow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants