Skip to content

Commit 9cdeeb8

Browse files
wangyumGitHub Enterprise
authored andcommitted
[SPARK-45755][SQL] Improve Dataset.isEmpty() by applying global limit 1 (apache#251)
### What changes were proposed in this pull request? This PR makes `Dataset.isEmpty()` to execute global limit 1 first. `LimitPushDown` may push down global limit 1 to lower nodes to improve query performance. Note that we use global limit 1 here, because the local limit cannot be pushed down the group only case: https://github.com/apache/spark/blob/89ca8b6065e9f690a492c778262080741d50d94d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L766-L770 ### Why are the changes needed? Improve query performance. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual testing: ```scala spark.range(300000000).selectExpr("id", "array(id, id % 10, id % 100) as eo").write.saveAsTable("t1") spark.range(100000000).selectExpr("id", "array(id, id % 10, id % 1000) as eo").write.saveAsTable("t2") println(spark.sql("SELECT * FROM t1 LATERAL VIEW explode_outer(eo) AS e UNION SELECT * FROM t2 LATERAL VIEW explode_outer(eo) AS e").isEmpty) ``` Before this PR | After this PR -- | -- <img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/417adc05-4160-4470-b63c-125faac08c9c"> | <img width="430" alt="image" src="https://github.com/apache/spark/assets/5399861/bdeff231-e725-4c55-9da2-1b4cd59ec8c8"> ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#43617 from wangyum/SPARK-45755. Lead-authored-by: Yuming Wang <[email protected]> Co-authored-by: Yuming Wang <[email protected]> Signed-off-by: Jiaan Geng <[email protected]> (cherry picked from commit c7bba9b)
1 parent 25acbf3 commit 9cdeeb8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,7 @@ class Dataset[T] private[sql](
651651
* @group basic
652652
* @since 2.4.0
653653
*/
654-
def isEmpty: Boolean = withAction("isEmpty", select().queryExecution) { plan =>
654+
def isEmpty: Boolean = withAction("isEmpty", select().limit(1).queryExecution) { plan =>
655655
plan.executeTake(1).isEmpty
656656
}
657657

0 commit comments

Comments
 (0)