-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat][Spark] Refactoring of the Maven project to allow multiple spark versions in the same moment #361
Conversation
The idea is to have the following structure:
Inside The code is locally compiling successfully for both profiles. This is a DRAFT! So, @acezen please make a look on pom files and overall structure. I do not want go deeper without a discussion of the overall concept. Thanks! |
Thanks for the work! I will review it ASAP. |
Test files are here just by accident, looks like I accidentally deleted a git-submodule by copying. I will clean it, of course. |
sorry, Sem, I am currently on a short vacation, and for the next couple of days, I will likely be unable to review this PR. Once I return from my break, I will promptly review it. |
On branch 320-test-different-spark-versions Changes to be committed: modified: spark/graphar/src/test/resources/gar-test modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphReader.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphTransformer.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphWriter.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestIndexGenerator.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestReader.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestWriter.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TransformExample.scala
Another problem that I found is that parquet-reader in 3.2 is incompatible with parquet-reader in 3.3+. The configuration |
@@ -0,0 +1,39 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the GeneralParams could be the common part of GraphAr-spark, does we need to duplicated it in datasource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are used inside data sources, and I duplicate it to avoid circular dependency (common depends of datasources because of Neo4j and if datasources will depend of common we will have a circular dependency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Maybe we update the spark config of def UDF(spark: SparkSession):
spark.conf.set("spark.sql.legacy.parquet.nanosAsLong", "false")
# assume that the application below need the config to be set |
the pom and overall structure look good to me, I have a small question that how the datasource version switch work when user compile with different spark version? |
Users should use maven profiles (spark32, spark33, etc.), defined in top level pom.xml. Spark version is specified inside maven profiles |
On branch 320-test-different-spark-versions Changes to be committed: renamed: spark/datasources-v1/pom.xml -> spark/datasources-32/pom.xml renamed: spark/datasources-v1/src/main/java/com/alibaba/graphar/GeneralParams.java -> spark/datasources-32/src/main/java/com/alibaba/graphar/GeneralParams.java renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarCommitProtocol.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarCommitProtocol.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarDataSource.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarDataSource.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarScanBuilder.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarScanBuilder.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarTable.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarTable.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/GarWriterBuilder.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/GarWriterBuilder.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/csv/CSVWriterBuilder.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/csv/CSVWriterBuilder.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/orc/OrcOutputWriter.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/orc/OrcOutputWriter.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/orc/OrcWriteBuilder.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/orc/OrcWriteBuilder.scala renamed: spark/datasources-v1/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala -> spark/datasources-32/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala renamed: spark/datasources-v2/pom.xml -> spark/datasources-33/pom.xml renamed: spark/datasources-v2/src/main/java/com/alibaba/graphar/GeneralParams.java -> spark/datasources-33/src/main/java/com/alibaba/graphar/GeneralParams.java renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarCommitProtocol.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarCommitProtocol.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarDataSource.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarDataSource.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarScan.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarScanBuilder.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarScanBuilder.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarTable.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarTable.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/GarWriterBuilder.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/GarWriterBuilder.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/csv/CSVWriterBuilder.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/csv/CSVWriterBuilder.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/orc/OrcOutputWriter.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/orc/OrcOutputWriter.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/orc/OrcWriteBuilder.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/orc/OrcWriteBuilder.scala renamed: spark/datasources-v2/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala -> spark/datasources-33/src/main/scala/com/alibaba/graphar/datasources/parquet/ParquetWriterBuilder.scala modified: spark/graphar/pom.xml modified: spark/graphar/src/test/scala/com/alibaba/graphar/ComputeExample.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphInfo.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphReader.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphTransformer.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestGraphWriter.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestIndexGenerator.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestReader.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TestWriter.scala modified: spark/graphar/src/test/scala/com/alibaba/graphar/TransformExample.scala modified: spark/pom.xml
@acezen By the latest commit I introduced changes that make all tests passed. But I still do not fully understand the logic inside |
The |
hi, Sem , can we make this refactoring by splitting the changes to some small PR that avoid hard to back tracing. Maybe we can start from separate the datasource part as a standalone package. |
There is a mvn command inside, and it won't work after introducing maven profiles. Now, each mvn command should also contain |
Ok, I will try. I will start from the only refactoring (datasources-32). And I will add datasources-33 in the next PR. |
|
I will close this PR and split the work to smaller PRs. |
Proposed changes
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...