[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation#4514
[HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation#4514leesf merged 4 commits intoapache:masterfrom
Conversation
306e7d4 to
3a20dc1
Compare
|
@hudi-bot run azure |
d9381d0 to
ac8d014
Compare
vinothchandar
left a comment
There was a problem hiding this comment.
Let me make another pass at all the pom changes. That seems to be main thing here.
In the meantime, could you clarify these comments?
Also have you tested these changes across spark 2.x and 3.1/3.2 bundles ?
| } | ||
|
|
||
| override def shortName(): String = "hudi" | ||
| override def shortName(): String = "hudi_v1" |
There was a problem hiding this comment.
@leesf i suppose this refactoring PR not meant to include this change?
There was a problem hiding this comment.
If not change the format, it would conflict with hudi-spark2/hudi-spark3.1.x/hudi-spark3 module format.
There was a problem hiding this comment.
would it conflict? Given we are extending DefaultSource and overriding shortName()?
There was a problem hiding this comment.
it is because in hudi-spark-bundle module. I used <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource> </transformer> to append the formats(hudi_v1 and hudi) in DataSourceRegister file, so it will conflict if not change the format. As to the PR itself, we do not need to change the format to hudi_v1 and not use AppendingTransformer. But when implementing V2 codepath, I find it difficult to handle the incremental bootstrap table(
@vinothchandar Yes, I have manually tested it with spark 3.2.0 and spark 3.1.2 version on spark sql. and the CI tested it on spark 2.4.x and works well. |
xushiyan
left a comment
There was a problem hiding this comment.
LGTM. Since a lot of files were moved, cherry pick later commits for 0.10.1 could be problematic. Shall we move this into a feature branch?
|
@xushiyan given we are almost winding down for 0.10.1, I suggest we land this sooner than later. That way we can focus on stabilizing master for 0.11.0 - that's not too far away. wdyt? |
it won't be too far away. @nsivabalan is cherry picking for 0.10.1 which will complete by Jan 9. Holding this off for 2 more days can avoid conflicts from some Spark fixes merged after this. Seeing 2 more fixes coming. After that we should be able to land this right away. |
|
2 more days should be okay? |
|
@leesf few questions.
|
vinothchandar
left a comment
There was a problem hiding this comment.
Made some cleanup suggestions. LGTM overall
|
yeah, would really appreciate if we can wait until Jan 9 to land this patch. thanks! |
|
b7b4aca to
b6a98a3
Compare
@nsivabalan @xushiyan time to land this patch? |
|
@leesf yes we won't have conflicting patches to pick from master. we can land this one now. |
|
@hudi-bot run azure |
… V2 Implementation (apache#4514) * Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x. * Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format. * Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL. * Added a README.md file under hudi-spark-datasource module.
… V2 Implementation (apache#4514) * Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x. * Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format. * Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL. * Added a README.md file under hudi-spark-datasource module.
… V2 Implementation (apache#4514) * Introduce hudi-spark3-common and hudi-spark2-common modules to place classes that would be reused in different spark versions, also introduce hudi-spark3.1.x to support spark 3.1.x. * Introduce hudi format under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change the hudi format in original hudi-spark module to hudi_v1 format. * Manually tested on Spark 3.1.2 and Spark 3.2.0 SQL. * Added a README.md file under hudi-spark-datasource module.
Tips
What is the purpose of the pull request
hudiformat under hudi-spark2, hudi-spark3, hudi-spark3.1.x modules and change thehudiformat in original hudi-spark module tohudi_v1formatBrief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.