-
Notifications
You must be signed in to change notification settings - Fork 587
[GLUTEN-11346][CORE][VL] Add Spark 4.1 Shim Layer #11347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1481,3 +1481,109 @@ jobs: | |
| **/target/*.log | ||
| **/gluten-ut/**/hs_err_*.log | ||
| **/gluten-ut/**/core.* | ||
|
|
||
| spark-test-spark41: | ||
| needs: build-native-lib-centos-7 | ||
| runs-on: ubuntu-22.04 | ||
| env: | ||
| SPARK_TESTING: true | ||
| container: apache/gluten:centos-8-jdk17 | ||
| steps: | ||
| - uses: actions/checkout@v2 | ||
| - name: Download All Artifacts | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: velox-native-lib-centos-7-${{github.sha}} | ||
| path: ./cpp/build/releases | ||
| - name: Download Arrow Jars | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: arrow-jars-centos-7-${{github.sha}} | ||
| path: /root/.m2/repository/org/apache/arrow/ | ||
| - name: Prepare | ||
| run: | | ||
| dnf module -y install python39 && \ | ||
| alternatives --set python3 /usr/bin/python3.9 && \ | ||
| pip3 install setuptools==77.0.3 && \ | ||
| pip3 install pyspark==3.5.5 cython && \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pyspark version should be 4.1.0
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, it was copied from Spark 4.0, cc @zhouyuan However, starting with Spark 4.1(apache/spark#51259), the minimum supported Python version is 3.10. I'm not familiar with how to configure the Python environment, so I've excluded these two unit tests for now, see (2ef147c). |
||
| pip3 install pandas==2.2.3 pyarrow==20.0.0 | ||
| - name: Prepare Spark Resources for Spark 4.1.0 #TODO remove after image update | ||
| run: | | ||
| rm -rf /opt/shims/spark41 | ||
| bash .github/workflows/util/install-spark-resources.sh 4.1 | ||
| mv /opt/shims/spark41/spark_home/assembly/target/scala-2.12 /opt/shims/spark41/spark_home/assembly/target/scala-2.13 | ||
| - name: Build and Run unit test for Spark 4.1.0 with scala-2.13 (other tests) | ||
| run: | | ||
| cd $GITHUB_WORKSPACE/ | ||
| export SPARK_SCALA_VERSION=2.13 | ||
| yum install -y java-17-openjdk-devel | ||
| export JAVA_HOME=/usr/lib/jvm/java-17-openjdk | ||
| export PATH=$JAVA_HOME/bin:$PATH | ||
| java -version | ||
| $MVN_CMD clean test -Pspark-4.1 -Pscala-2.13 -Pjava-17 -Pbackends-velox \ | ||
| -Pspark-ut -DargLine="-Dspark.test.home=/opt/shims/spark41/spark_home/" \ | ||
| -DtagsToExclude=org.apache.spark.tags.ExtendedSQLTest,org.apache.gluten.tags.UDFTest,org.apache.gluten.tags.EnhancedFeaturesTest,org.apache.gluten.tags.SkipTest | ||
| - name: Upload test report | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: ${{ github.job }}-report | ||
| path: '**/surefire-reports/TEST-*.xml' | ||
| - name: Upload unit tests log files | ||
| if: ${{ !success() }} | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: ${{ github.job }}-test-log | ||
| path: | | ||
| **/target/*.log | ||
| **/gluten-ut/**/hs_err_*.log | ||
| **/gluten-ut/**/core.* | ||
|
|
||
| spark-test-spark41-slow: | ||
| needs: build-native-lib-centos-7 | ||
| runs-on: ubuntu-22.04 | ||
| env: | ||
| SPARK_TESTING: true | ||
| container: apache/gluten:centos-8-jdk17 | ||
| steps: | ||
| - uses: actions/checkout@v2 | ||
| - name: Download All Artifacts | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: velox-native-lib-centos-7-${{github.sha}} | ||
| path: ./cpp/build/releases | ||
| - name: Download Arrow Jars | ||
| uses: actions/download-artifact@v4 | ||
| with: | ||
| name: arrow-jars-centos-7-${{github.sha}} | ||
| path: /root/.m2/repository/org/apache/arrow/ | ||
| - name: Prepare Spark Resources for Spark 4.1.0 #TODO remove after image update | ||
| run: | | ||
| rm -rf /opt/shims/spark41 | ||
| bash .github/workflows/util/install-spark-resources.sh 4.1 | ||
| mv /opt/shims/spark41/spark_home/assembly/target/scala-2.12 /opt/shims/spark41/spark_home/assembly/target/scala-2.13 | ||
| - name: Build and Run unit test for Spark 4.0 (slow tests) | ||
| run: | | ||
| cd $GITHUB_WORKSPACE/ | ||
| yum install -y java-17-openjdk-devel | ||
| export JAVA_HOME=/usr/lib/jvm/java-17-openjdk | ||
| export PATH=$JAVA_HOME/bin:$PATH | ||
| java -version | ||
| $MVN_CMD clean test -Pspark-4.1 -Pscala-2.13 -Pjava-17 -Pbackends-velox -Pspark-ut \ | ||
| -DargLine="-Dspark.test.home=/opt/shims/spark41/spark_home/" \ | ||
| -DtagsToInclude=org.apache.spark.tags.ExtendedSQLTest | ||
| - name: Upload test report | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: ${{ github.job }}-report | ||
| path: '**/surefire-reports/TEST-*.xml' | ||
| - name: Upload unit tests log files | ||
| if: ${{ !success() }} | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: ${{ github.job }}-test-log | ||
| path: | | ||
| **/target/*.log | ||
| **/gluten-ut/**/hs_err_*.log | ||
| **/gluten-ut/**/core.* | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add the tpc tests
https://github.com/apache/incubator-gluten/blob/main/.github/workflows/velox_backend_x86.yml#L104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zhouyuan
I understand that we need to add this here. Spark 4.1 has a new option
spark.sql.unionOutputPartitioningintroduced in apache/spark#51623. Currently, it needs to be set tofalsefor successful execution. I plan to submit a separate PR later to address this, which will make the review process more convenient."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in #11353