Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat][Spark] Spark 3.3.x support as a Maven Profile #376

Merged
merged 12 commits into from
Feb 25, 2024

Conversation

SemyonSinchenko
Copy link
Member

Proposed changes

As discussed in #320

  • support of spark-3.3.x was added as a separate Maven Profile
  • test were modified a little
  • matrix-strategy was added into CI/CD

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Close #319

+ matrix strategy was incorporated into CI/CD

 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
	new file:   spark/datasources-33/.scalafmt.conf
	new file:   spark/datasources-33/pom.xml
	new file:   spark/datasources-33/src/main/java/com/alibaba/graphar/GeneralParams.java
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarCommitProtocol.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarDataSource.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarScanBuilder.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarTable.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarWriterBuilder.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/csv/CSVWriterBuilder.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/orc/OrcOutputWriter.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/orc/OrcWriteBuilder.scala
	new file:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/ComputeExample.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphReader.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphTransformer.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphWriter.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestIndexGenerator.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestReader.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestWriter.scala
	modified:   spark/pom.xml
 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
- update of licenserc
- spark-home as a matrix variable in CI

 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
	modified:   .licenserc.yaml
 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
	modified:   spark/scripts/get-spark-to-home.sh
 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
	modified:   spark/scripts/build.sh
 On branch 319-spark-33x
 Changes to be committed:
	modified:   .github/workflows/spark.yaml
- legacy.parquet....
- fieldId.read...
- fieldId.write...

They are required (have no idea why) to avoid NPE in spark.conf.get expressions
starting from 3.3.x

 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/GraphAr2Nebula.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/GraphAr2Neo4j.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/Nebula2GraphAr.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/Neo4j2GraphAr.scala
 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/graphar/pom.xml
	modified:   spark/scripts/build.sh
 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/graphar/pom.xml
 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/importer/Neo4j.scala
- Update FileSystem in util
- Drop non needed confs in test cases

 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/GraphAr2Nebula.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/GraphAr2Neo4j.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/Nebula2GraphAr.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/example/Neo4j2GraphAr.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/importer/Neo4j.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/util/FileSystem.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphTransformer.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphWriter.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestIndexGenerator.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestWriter.scala
@SemyonSinchenko
Copy link
Member Author

It fails on read, not write. So, updating FileSystem did not help.

 On branch 319-spark-33x
 Changes to be committed:
	modified:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala
	modified:   spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala
	modified:   spark/graphar/src/main/scala/com/alibaba/graphar/util/FileSystem.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/ComputeExample.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphReader.scala
	modified:   spark/graphar/src/test/scala/com/alibaba/graphar/TestReader.scala
@SemyonSinchenko SemyonSinchenko self-assigned this Feb 25, 2024
@SemyonSinchenko
Copy link
Member Author

@acezen I found the right way to fix it. Sorry, it was my fault. Please, check the latest commit.

Copy link
Contributor

@acezen acezen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Wonderful work! Thanks, Sem

@acezen acezen merged commit b4e076a into apache:main Feb 25, 2024
5 checks passed
@SemyonSinchenko SemyonSinchenko deleted the 319-spark-33x branch April 22, 2024 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat][Spark] Support spark 3.3.x
2 participants