Skip to content

Conversation

@DonnyZone
Copy link
Contributor

What changes were proposed in this pull request?

Recently, we have also encountered such NPE issues in our production environment as described in:
https://issues.apache.org/jira/browse/SPARK-19471

This issue can be reproduced by the following examples:
` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y")

//HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),sum("y")).show()

//ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()

//SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()

This PR is based on PR-16820(#16820) with test cases for all aggregation paths. We want to push it forward.

When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions.

How was this patch tested?

unit test
verified in production environment

@DonnyZone
Copy link
Contributor Author

Jenkins, test this please

@DonnyZone
Copy link
Contributor Author

@gatorsmile
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80582 has finished for PR 18920 at commit b932d2f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DonnyZone
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2017

Test build #80591 has finished for PR 18920 at commit bb29b8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString),
(SQLConf.USE_OBJECT_HASH_AGG.key, useObjectHashAgg.toString)) {
val df = Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4)).toDF("x", "y")
// HashAggregate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check/compare the plans to ensure they are HashAggregate, ObjectHashAggregate and SortAggregate.

@DonnyZone
Copy link
Contributor Author

updated

@DonnyZone
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80599 has finished for PR 18920 at commit 5239ebb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(hashAggPlan.find(p =>
p.isInstanceOf[WholeStageCodegenExec] &&
p.asInstanceOf[WholeStageCodegenExec].child
.isInstanceOf[HashAggregateExec]).isDefined)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

          assert(hashAggPlan.find {
            case WholeStageCodegenExec(_: HashAggregateExec) => true
            case _ => false
          }.isDefined)

Seq(
monotonically_increasing_id(), spark_partition_id(),
rand(Random.nextLong()), randn(Random.nextLong())
).foreach(assertNoExceptions(_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> ).foreach(assertNoExceptions)

allImperativeAggregateFunctions(i).eval(currentBuffer))
i += 1
}
resultProjection.initialize(partIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to line 221

typedImperativeAggregates(i).serializeAggregateBufferInPlace(currentBuffer)
i += 1
}
resultProjection.initialize(partIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to line 240

// Grouping-only: we only output values based on grouping expressions.
val resultProjection = UnsafeProjection.create(resultExpressions, groupingAttributes)
(currentGroupingKey: UnsafeRow, currentBuffer: InternalRow) => {
resultProjection.initialize(partIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to line 261

hashAggDF.collect()

// ObjectHashAggregate and SortAggregate test cases
val objHashOrSort_AggDF = df.groupBy("x").agg(c, collect_list("y"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objHashOrSort_AggDF -> objHashAggOrSortAggDf


// ObjectHashAggregate and SortAggregate test cases
val objHashOrSort_AggDF = df.groupBy("x").agg(c, collect_list("y"))
val objHashOrSort_Plan = objHashOrSort_AggDF.queryExecution.executedPlan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objHashOrSort_Plan -> objHashAggOrSortAggPlan

@DonnyZone
Copy link
Contributor Author

Updated, thanks for reviewing.

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80615 has finished for PR 18920 at commit f55b161.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DonnyZone
Copy link
Contributor Author

retest please

@SparkQA
Copy link

SparkQA commented Aug 14, 2017

Test build #80625 has finished for PR 18920 at commit d58ffaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in fbc2692 Aug 14, 2017
).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
}

private def assertNoExceptions(c: Column): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you submit a follow-up PR to move this test case to DataFrameAggregateSuite? Thanks!

@DonnyZone
Copy link
Contributor Author

Sure, I will do it later.

asfgit pushed a commit that referenced this pull request Aug 15, 2017
…ted result projection before using it

## What changes were proposed in this pull request?

This is a follow-up PR that moves the test case in PR-18920 (#18920) to DataFrameAggregateSuit.

## How was this patch tested?
unit test

Author: donnyzone <[email protected]>

Closes #18946 from DonnyZone/branch-19471-followingPR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants