Skip to content

Comments

[SPARK-30801][SQL] Subqueries should not be AQE-ed if main query is not#27554

Closed
maryannxue wants to merge 3 commits intoapache:masterfrom
maryannxue:spark-30801
Closed

[SPARK-30801][SQL] Subqueries should not be AQE-ed if main query is not#27554
maryannxue wants to merge 3 commits intoapache:masterfrom
maryannxue:spark-30801

Conversation

@maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

This PR makes sure AQE is either enabled or disabled for the entire query, including the main query and all subqueries.
Currently there are unsupported queries by AQE, e.g., queries that contain DPP filters. We need to make sure that if the main query is unsupported, none of the sub-queries should apply AQE, otherwise it can lead to performance regressions due to missed opportunity of sub-query reuse.

Why are the changes needed?

To get rid of potential perf regressions when AQE is turned on.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Updated DynamicPartitionPruningSuite:

  1. Removed the existing workaround withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "false")
  2. Added DynamicPartitionPruningSuiteAEOn and DynamicPartitionPruningSuiteAEOff to enable testing this suite with AQE on and off options
  3. Added a check in checkPartitionPruningPredicate to verify that the subqueries are always in sync with the main query in terms of whether AQE is applied.

@maryannxue
Copy link
Contributor Author

cc @cloud-fan @gatorsmile

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118324 has finished for PR 27554 at commit becb0d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118332 has finished for PR 27554 at commit 05a6b6f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, can you fix conflicts?

@maryannxue
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118368 has finished for PR 27554 at commit 1adf99e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Like(left: Expression, right: Expression, escapeChar: Char)
  • case class RLike(left: Expression, right: Expression) extends StringRegexExpression
  • case class AlterTableReplaceColumnsStatement(
  • trait LegacyDateFormatter extends DateFormatter
  • class LegacyFastDateFormatter(pattern: String, locale: Locale) extends LegacyDateFormatter
  • class LegacySimpleDateFormatter(pattern: String, locale: Locale) extends LegacyDateFormatter
  • class MicrosCalendar(tz: TimeZone, digitsInFraction: Int)
  • class LegacyFastTimestampFormatter(
  • class LegacySimpleTimestampFormatter(
  • trait BaseAggregateExec extends UnaryExecNode

@SparkQA
Copy link

SparkQA commented Feb 13, 2020

Test build #118374 has finished for PR 27554 at commit 1adf99e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Like(left: Expression, right: Expression, escapeChar: Char)
  • case class RLike(left: Expression, right: Expression) extends StringRegexExpression
  • case class AlterTableReplaceColumnsStatement(
  • trait LegacyDateFormatter extends DateFormatter
  • class LegacyFastDateFormatter(pattern: String, locale: Locale) extends LegacyDateFormatter
  • class LegacySimpleDateFormatter(pattern: String, locale: Locale) extends LegacyDateFormatter
  • class MicrosCalendar(tz: TimeZone, digitsInFraction: Int)
  • class LegacyFastTimestampFormatter(
  • class LegacySimpleTimestampFormatter(
  • trait BaseAggregateExec extends UnaryExecNode

@cloud-fan cloud-fan closed this in 0aed77a Feb 14, 2020
@cloud-fan
Copy link
Contributor

LGTM, merging to master/3.0!

cloud-fan pushed a commit that referenced this pull request Feb 14, 2020
### What changes were proposed in this pull request?
This PR makes sure AQE is either enabled or disabled for the entire query, including the main query and all subqueries.
Currently there are unsupported queries by AQE, e.g., queries that contain DPP filters. We need to make sure that if the main query is unsupported, none of the sub-queries should apply AQE, otherwise it can lead to performance regressions due to missed opportunity of sub-query reuse.

### Why are the changes needed?
To get rid of potential perf regressions when AQE is turned on.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Updated DynamicPartitionPruningSuite:
1. Removed the existing workaround `withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "false")`
2. Added `DynamicPartitionPruningSuiteAEOn` and `DynamicPartitionPruningSuiteAEOff` to enable testing this suite with AQE on and off options
3. Added a check in `checkPartitionPruningPredicate` to verify that the subqueries are always in sync with the main query in terms of whether AQE is applied.

Closes #27554 from maryannxue/spark-30801.

Authored-by: maryannxue <maryannxue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 0aed77a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
This PR makes sure AQE is either enabled or disabled for the entire query, including the main query and all subqueries.
Currently there are unsupported queries by AQE, e.g., queries that contain DPP filters. We need to make sure that if the main query is unsupported, none of the sub-queries should apply AQE, otherwise it can lead to performance regressions due to missed opportunity of sub-query reuse.

### Why are the changes needed?
To get rid of potential perf regressions when AQE is turned on.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Updated DynamicPartitionPruningSuite:
1. Removed the existing workaround `withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "false")`
2. Added `DynamicPartitionPruningSuiteAEOn` and `DynamicPartitionPruningSuiteAEOff` to enable testing this suite with AQE on and off options
3. Added a check in `checkPartitionPruningPredicate` to verify that the subqueries are always in sync with the main query in terms of whether AQE is applied.

Closes apache#27554 from maryannxue/spark-30801.

Authored-by: maryannxue <maryannxue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants