Skip to content

Conversation

@hvanhovell
Copy link
Contributor

@hvanhovell hvanhovell commented Oct 3, 2016

What changes were proposed in this pull request?

In practice we cannot guarantee that an InternalRow is immutable. This makes the MutableRow almost redundant. This PR folds MutableRow into InternalRow.

The code below illustrates the immutability issue with InternalRow:

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
val struct = new GenericMutableRow(1)
val row = InternalRow(struct, 1)
println(row)
scala> [[null], 1]
struct.setInt(0, 42)
println(row)
scala> [[42], 1]

This might be somewhat controversial, so feedback is appreciated.

How was this patch tested?

Existing tests.

@hvanhovell
Copy link
Contributor Author

cc @cloud-fan we have discussed this in your MutableProjection PR.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66270 has finished for PR 15333 at commit 7321c05.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MutableUnsafeRow(val writer: UnsafeRowWriter) extends BaseGenericInternalRow(null)

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66275 has finished for PR 15333 at commit 8f2a17d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66277 has finished for PR 15333 at commit c1eb96c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 4, 2016

Test build #66286 has finished for PR 15333 at commit 09d533a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

liancheng commented Oct 4, 2016

Would be nice to add a simple example in the PR description to illustrate why we can't ensure that a GenericInternalRow is immutable. For example, for a GenericInternalRow with a StructType field, it's legal to put a MutableRow into the cell. This essentially makes the outer GenericInternalRow mutable. (In fact, we are already doing this in Spark, either intentionally or unintentionally.)

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
@SparkQA
Copy link

SparkQA commented Oct 4, 2016

Test build #66327 has finished for PR 15333 at commit 6d83f58.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class UnsafeAlignedOffset
    • case class ColumnStat(statRow: InternalRow)
    • case class NumericColumnStat[T <: AtomicType](statRow: InternalRow, dataType: T)
    • case class StringColumnStat(statRow: InternalRow)
    • case class BinaryColumnStat(statRow: InternalRow)
    • case class BooleanColumnStat(statRow: InternalRow)
    • case class AnalyzeColumnCommand(
    • case class AnalyzeTableCommand(

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala
@marmbrus
Copy link
Contributor

marmbrus commented Oct 4, 2016

This seems like a reasonable simplification to me. A little bit of history (though this has diverged significantly, so don't take this authoritative): I think this complexity all stems from SpecificMutableRow, which had nothing to do with Immutable vs Mutable, but instead about having a more efficient row that you could mutate without boxing. This was back before Tungsten/codegen, so is probably obsolete now.

@SparkQA
Copy link

SparkQA commented Oct 4, 2016

Test build #66330 has finished for PR 15333 at commit b650c86.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor Author

Merging to master. Thanks for commenting.

@asfgit asfgit closed this in 97594c2 Oct 7, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
In practice we cannot guarantee that an `InternalRow` is immutable. This makes the `MutableRow` almost redundant. This PR folds `MutableRow` into `InternalRow`.

The code below illustrates the immutability issue with InternalRow:
```scala
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.GenericMutableRow
val struct = new GenericMutableRow(1)
val row = InternalRow(struct, 1)
println(row)
scala> [[null], 1]
struct.setInt(0, 42)
println(row)
scala> [[42], 1]
```

This might be somewhat controversial, so feedback is appreciated.

## How was this patch tested?
Existing tests.

Author: Herman van Hovell <[email protected]>

Closes apache#15333 from hvanhovell/SPARK-17761.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants