[SPARK-51064][SQL] Enable `spark.sql.sources.v2.bucketing.enabled` by default #49766

dongjoon-hyun · 2025-02-03T03:11:27Z

What changes were proposed in this pull request?

This PR aims to enable spark.sql.sources.v2.bucketing.enabled by default for Apache Spark 4.1.0.

Why are the changes needed?

We have been using spark.sql.sources.v2.bucketing.enabled since Apache Spark 3.3.0 stably to improve Spark performance on V2 data source. Although Apache Spark enables this configuration, Spark checks if all InputPartitions have HasPartitionKey or not still.

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala

Lines 140 to 142 in 8fc6a20

    
           if (results.length != inputPartitions.length || inputPartitions.isEmpty) { 
        
             // Not all of the `InputPartitions` implements `HasPartitionKey`, therefore skip here. 
        
             None

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-02-03T03:24:22Z

What do you think about this change, @sunchao and @szehon-ho ?

… default

dongjoon-hyun · 2025-02-03T19:36:06Z

Also, cc @aokolnychyi , @viirya , @huaxingao . I'd like to get your opinions on this.

sunchao

thanks @dongjoon-hyun ! +1 on making this as the default config - the feature has been stabilized for several release cycles already.

viirya · 2025-02-04T17:17:22Z

As it is used since Apache Spark 3.3.0, the feature should be stable enough for now. Except for any concerns from others, I'm okay for this change.

dongjoon-hyun · 2025-02-04T17:36:17Z

Thank you, @sunchao and @viirya .
Merged to master for Apache Spark 4.1.0.

guangyu-yang-rokt · 2025-04-07T14:31:06Z

Hi @dongjoon-hyun, sorry this might be a question a bit unrelated to this PR.

Context:
We are currently introducing SPJ to our production environment. Our iceberg table is partitioned by timestamp with day transformation and in our ML processing job we will read past 30 days worth of data with filter on timestamp column which will be pushed down to iceberg. So 30 partitions will be reported by iceberg. I have observed that with spark.sql.sources.v2.bucketing.enabled, spark will then generate one task per partition, which will be only 30 tasks in our case when we are doing batchScan. This has led to cluster resource under utilisation since we have 40 execs and 15 cores each (so at max 600 tasks in parallel). That impact the batchScan performance a lot - same stage from 2.4 mins to 10+mins.

Have you encountered the same issue in your use case? I must be missing something here. Any insights will be much appreciated!

szehon-ho · 2025-04-07T20:56:11Z

@guangyu-yang-rokt yea its right, by design in SPJ the number of Iceberg partitions is equal to number of Spark partitions, so sizing that is critical. There is an option to enable more parallelism in spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled, but it leads to more times the data read, and doesnt work in all cases (not FULL OUTER joins)

guangyu-yang-rokt · 2025-04-07T22:29:02Z

Thanks @szehon-ho! one follow up question - in our query, we only do filtering on timestamp column but join key is something different (joining on non-partition keys). I have checked that BatchScanExec is reporting groupedBy=[timestamp_day] in query plan. I'm not too familiar with spark codebase but I guess filter pushdown to iceberg also tell BatchScanExec to group by partition key if there is a filter on partition key. With spark.sql.sources.v2.bucketing.enabledset to true, it will slow down batchScan for joins that are not joining on partition keys. (we have a self-implemented featurestore which will spin up multiple joins to gather features in so I need to enable all SPJ related configs globally)

This is kind not making sense to me since I'm not joining on timestamp so I would expect SPJ shouldn't kick in. Or I would imagine a configuration like spark.sql.sources.v2.ignoreFiltering to tell BatchScanExec don't grouped by partition key if it is just a filter and not a join key

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` |

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | apache#11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |

## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | #11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |

github-actions bot added the SQL label Feb 3, 2025

dongjoon-hyun changed the title ~~[SPARK-51064][SQL] Enable spark.sql.sources.v2.bucketing.enabled by…~~ [SPARK-51064][SQL] Enable spark.sql.sources.v2.bucketing.enabled by default Feb 3, 2025

[SPARK-51064][SQL] Enable spark.sql.sources.v2.bucketing.enabled by…

d52d085

… default

dongjoon-hyun force-pushed the SPARK-51064 branch from d616086 to d52d085 Compare February 3, 2025 05:53

dongjoon-hyun marked this pull request as ready for review February 3, 2025 18:32

Update doc too

658b400

github-actions bot added the DOCS label Feb 3, 2025

sunchao approved these changes Feb 4, 2025

View reviewed changes

viirya approved these changes Feb 4, 2025

View reviewed changes

dongjoon-hyun closed this in 80cdc15 Feb 4, 2025

dongjoon-hyun deleted the SPARK-51064 branch February 4, 2025 17:36

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 22, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

d81f016

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 22, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

29108e8

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 22, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

ec2983b

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

9701510

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

60a2c69

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

cfa2a0b

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

b5ce788

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

f484e23

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

8f40774

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

ac3145a

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 23, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

c1fce1c

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 30, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

db8f53f

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

a275e4a

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

9817dff

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

94d3349

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

9274931

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

b402a25

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

144e312

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

76cf131

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

b579c14

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

05604d6

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Dec 31, 2025

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

243feba

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

dffe8a5

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

0d130e6

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

9ae28ef

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

782f53e

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

0ccfa24

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

84c410f

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

f7ab297

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

9b78cdd

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

b214db4

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

c5bcdf8

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 4, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

42317d4

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 5, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

af0f6a3

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen added a commit to baibaichen/gluten that referenced this pull request Jan 5, 2026

[Fix] Disable V2 bucketing in GlutenDynamicPartitionPruningSuite as per

e1a51ba

apache/spark#49766, since spark.sql.sources.v2.bucketing.enabled is now enabled by default.

baibaichen mentioned this pull request Jan 7, 2026

[GLUTEN-11343][CORE][VL] Support Spark 4.1 UT apache/incubator-gluten#11353

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51064][SQL] Enable `spark.sql.sources.v2.bucketing.enabled` by default #49766

[SPARK-51064][SQL] Enable `spark.sql.sources.v2.bucketing.enabled` by default #49766

Uh oh!

dongjoon-hyun commented Feb 3, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Feb 3, 2025

Uh oh!

dongjoon-hyun commented Feb 3, 2025

Uh oh!

sunchao left a comment

Uh oh!

viirya commented Feb 4, 2025

Uh oh!

dongjoon-hyun commented Feb 4, 2025

Uh oh!

guangyu-yang-rokt commented Apr 7, 2025

Uh oh!

szehon-ho commented Apr 7, 2025

Uh oh!

guangyu-yang-rokt commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	if (results.length != inputPartitions.length \|\| inputPartitions.isEmpty) {
	// Not all of the `InputPartitions` implements `HasPartitionKey`, therefore skip here.
	None

[SPARK-51064][SQL] Enable spark.sql.sources.v2.bucketing.enabled by default #49766

[SPARK-51064][SQL] Enable spark.sql.sources.v2.bucketing.enabled by default #49766

Uh oh!

Conversation

dongjoon-hyun commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Feb 3, 2025

Uh oh!

dongjoon-hyun commented Feb 3, 2025

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

viirya commented Feb 4, 2025

Uh oh!

dongjoon-hyun commented Feb 4, 2025

Uh oh!

guangyu-yang-rokt commented Apr 7, 2025

Uh oh!

szehon-ho commented Apr 7, 2025

Uh oh!

guangyu-yang-rokt commented Apr 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-51064][SQL] Enable `spark.sql.sources.v2.bucketing.enabled` by default #49766

[SPARK-51064][SQL] Enable `spark.sql.sources.v2.bucketing.enabled` by default #49766

dongjoon-hyun commented Feb 3, 2025 •

edited

Loading