[GLUTEN-11343][CORE][VL] Support Spark 4.1 UT#11353
Conversation
|
Run Gluten Clickhouse CI on x86 |
c381112 to
188cf15
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
42317d4 to
d0b2f8f
Compare
|
Run Gluten Clickhouse CI on x86 |
d0b2f8f to
87f9a2b
Compare
|
Run Gluten Clickhouse CI on x86 |
87f9a2b to
c286b0b
Compare
|
Run Gluten Clickhouse CI on x86 |
c286b0b to
60037df
Compare
|
Run Gluten Clickhouse CI on x86 |
60037df to
991a1de
Compare
|
Run Gluten Clickhouse CI on x86 |
gluten-ut/spark40/pom.xml
Outdated
| <activation> | ||
| <activeByDefault>false</activeByDefault> | ||
| </activation> | ||
| <properties> |
There was a problem hiding this comment.
We need to remove these properties in a subsequent PR.
There was a problem hiding this comment.
Pull request overview
This PR adds support for Spark 4.1 unit tests by updating the build configuration, resolving compatibility issues, and adding new test resources. The changes accommodate API changes introduced in Spark 4.1, including dependency updates, package refactorings, and configuration parameter modifications.
Key Changes
- Updated build and dependency configurations to support Spark 4.1 testing
- Fixed compatibility issues from Spark API changes (streaming package refactoring, TypedConfigBuilder, V2 bucketing defaults)
- Added comprehensive SQL test input files for Spark 4.1 compatibility validation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
991a1de to
83be8d5
Compare
|
Run Gluten Clickhouse CI on x86 |
## Changes | Cause | Type | Category | Description | Affected Files | |-------|------|----------|-------------|----------------| | N/A | Feat | Build | Update build configuration to support Spark 4.1 UT | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/pom.xml`, `gluten-ut/spark41/pom.xml`, `tools/gluten-it/pom.xml` | | [#52165](apache/spark#52165) | Fix | Dependency | Update Parquet dependency version to 1.16.0 to avoid NoSuchMethodError issue | `gluten-ut/spark41/pom.xml` | | [#51477](apache/spark#51477) | Fix | Compatibility | Update imports to reflect streaming runtime package refactoring in Apache Spark | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala`, `gluten-ut/spark41/.../GlutenStreamingQuerySuite.scala` | | [#50674](apache/spark#50674) | Fix | Compatibility | Fix compatibility issue introduced by `TypedConfigBuilder` | `gluten-substrait/.../ExpressionConverter.scala`, `gluten-ut/spark41/.../GlutenCSVSuite.scala`, `gluten-ut/spark41/.../GlutenJsonSuite.scala` | | [#49766](apache/spark#49766) | Fix | Compatibility | Disable V2 bucketing in GlutenDynamicPartitionPruningSuite since spark.sql.sources.v2.bucketing.enabled is now enabled by default | `gluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala` | | [#42414](apache/spark#42414), [#53038](apache/spark#53038) | Fix | Bug Fix | Resolve an issue introduced by SPARK-42414, as identified in SPARK-53038 | `backends-velox/.../VeloxBloomFilterAggregate.scala` | | N/A | Fix | Bug Fix | Enforce row fallback for unsupported cached batches - keep columnar execution only when schema validation succeeds | `backends-velox/.../ColumnarCachedBatchSerializer.scala` | | [SPARK-53132](apache/spark#53132), [SPARK-53142](apache/spark#53142) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 KeyGroupedPartitioningSuite tests. Excluded tests: `SPARK-53322*`, `SPARK-54439*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [SPARK-53535](https://issues.apache.org/jira/browse/SPARK-53535), [SPARK-54220](https://issues.apache.org/jira/browse/SPARK-54220) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenParquetIOSuite tests. Excluded tests: `SPARK-53535*`, `vectorized reader: missing all struct fields*`, `SPARK-54220*` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52645](apache/spark#52645) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenStreamingQuerySuite tests. Excluded tests: `SPARK-53942: changing the number of stateless shuffle partitions via config`, `SPARK-53942: stateful shuffle partitions are retained from old checkpoint` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#47856](apache/spark#47856) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenDataFrameWindowFunctionsSuite and GlutenJoinSuite tests. Excluded tests: `SPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold`, `SPARK-49386: test SortMergeJoin (with spill by size threshold)` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#52157](apache/spark#52157) | 4.1.0 | Test Exclusion | Exclude additional Spark 4.1 GlutenQueryExecutionSuite tests. Excluded test: `#53413: Cleanup shuffle dependencies for commands` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#48470](apache/spark#48470) | 4.1.0 | Test Exclusion | Exclude split test in GlutenRegexpExpressionsSuite. Excluded test: `GlutenRegexpExpressionsSuite.SPLIT` | `gluten-ut/spark41/.../VeloxTestSettings.scala` | | [#51623](apache/spark#51623) | 4.1.0 | Test Exclusion | Add `spark.sql.unionOutputPartitioning=false` to Maven test args. Excluded tests: `GlutenBroadcastExchangeSuite.SPARK-52962`, `GlutenDataFrameSetOperationsSuite.SPARK-52921*` | `.github/workflows/velox_backend_x86.yml`, `gluten-ut/spark41/.../VeloxTestSettings.scala`, `tools/gluten-it/common/.../Suite.scala` | | N/A | 4.1.0 | Test Exclusion | Excludes failed SQL tests that need to be fixed for Spark 4.1 compatibility. Excluded tests: `decimalArithmeticOperations.sql`, `identifier-clause.sql`, `keywords.sql`, `literals.sql`, `operators.sql`, `exists-orderby-limit.sql`, `postgreSQL/date.sql`, `nonansi/keywords.sql`, `nonansi/literals.sql`, `datetime-legacy.sql`, `datetime-parsing-invalid.sql`, `misc-functions.sql` | `gluten-ut/spark41/.../VeloxSQLQueryTestSettings.scala` | | apache#11252 | 4.1.0 | Test Exclusion | Exclude Gluten test for SPARK-47939: Explain should work with parameterized queries | `gluten-ut/spark41/.../VeloxTestSettings.scala` |
83be8d5 to
20952d1
Compare
|
Run Gluten Clickhouse CI on x86 |
What changes are proposed in this pull request?
.github/workflows/velox_backend_x86.yml,gluten-ut/pom.xml,gluten-ut/spark41/pom.xml,tools/gluten-it/pom.xmlgluten-ut/spark41/pom.xmlgluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scala,gluten-ut/spark41/.../GlutenStreamingQuerySuite.scalaTypedConfigBuildergluten-substrait/.../ExpressionConverter.scala,gluten-ut/spark41/.../GlutenCSVSuite.scala,gluten-ut/spark41/.../GlutenJsonSuite.scalagluten-ut/spark41/.../GlutenDynamicPartitionPruningSuite.scalabackends-velox/.../VeloxBloomFilterAggregate.scalabackends-velox/.../ColumnarCachedBatchSerializer.scalaSPARK-53322*,SPARK-54439*gluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-53535*,vectorized reader: missing all struct fields*,SPARK-54220*gluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-53942: changing the number of stateless shuffle partitions via config,SPARK-53942: stateful shuffle partitions are retained from old checkpointgluten-ut/spark41/.../VeloxTestSettings.scalaSPARK-49386: Window spill with more than the inMemoryThreshold and spillSizeThreshold,SPARK-49386: test SortMergeJoin (with spill by size threshold)gluten-ut/spark41/.../VeloxTestSettings.scala#53413: Cleanup shuffle dependencies for commandsgluten-ut/spark41/.../VeloxTestSettings.scalaGlutenRegexpExpressionsSuite.SPLITgluten-ut/spark41/.../VeloxTestSettings.scalaspark.sql.unionOutputPartitioning=falseto Maven test args. Excluded tests:GlutenBroadcastExchangeSuite.SPARK-52962,GlutenDataFrameSetOperationsSuite.SPARK-52921*.github/workflows/velox_backend_x86.yml,gluten-ut/spark41/.../VeloxTestSettings.scala,tools/gluten-it/common/.../Suite.scaladecimalArithmeticOperations.sql,identifier-clause.sql,keywords.sql,literals.sql,operators.sql,exists-orderby-limit.sql,postgreSQL/date.sql,nonansi/keywords.sql,nonansi/literals.sql,datetime-legacy.sql,datetime-parsing-invalid.sql,misc-functions.sqlgluten-ut/spark41/.../VeloxSQLQueryTestSettings.scalagluten-ut/spark41/.../VeloxTestSettings.scalaFixes #11343
How was this patch tested?
Tested with Spark 4.1 unit tests.