-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-37][WIP] Persist the HoodieIndex type in the hoodie.properties file #2136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6ea52cf to
814bfc0
Compare
1a04f9d to
52741ce
Compare
|
@vinothchandar @n3nash can you help to review? Thanks |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lw309637554 Change itself looks good to me. However, wondering if we should also
- Add some check to throw an error if the index type is different in
hoodie.propertiesthan it's configured. some changes are compatible for e.g usingGLOBAL_BLOOMand then switching toBLOOM/SIMPLE. but reverse is not. - I think it's high time we introduce a builder pattern init the table properties. those overloaded
initXXare hard to read. If interested, we can do that in a separate PR. - Should we also add something to upgrade downgrade logic to persist this for existing datasets.
Thanks , i will think about |
OK, i will add it.
OK, i interested.
OK, i will try it. |
bd1865d to
3ef10a3
Compare
add compatible check in AbstractHoodieClient.createMetaClient().
open a new issue, will land it in https://issues.apache.org/jira/browse/HUDI-1315
have added in AbstractUpgradeDowngrade.createUpdatedFile() |
f70cdc8 to
a69608f
Compare
…when upgrade downgrade and check compatible
|
@lw309637554 is this ready for re-review? please lmk |
@vinothchandar yeah, i think it is ready. Completeness of compatibility checks in AbstractHoodieClient.createMetaClient() need your suggestion. |
|
@vinothchandar please help to review again ,thanks. |
|
@lw309637554 will do in the next 1-2 days. thanks for your patience |
thanks |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Mostly code/structure/use-of-class related.
but 1 main comment around some missing cases.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/index/TestHoodieIndex.java
Outdated
Show resolved
Hide resolved
...ent/hudi-spark-client/src/test/java/org/apache/hudi/client/TestTableIndexTypeCompatible.java
Outdated
Show resolved
Hide resolved
hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
Outdated
Show resolved
Hide resolved
hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
Outdated
Show resolved
Hide resolved
…dex at read and write
@vinothchandar yes , we are so close. have resolved as your suggestion. can you help to review again. Thanks so much. |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of one more scenario. @lw309637554 Love to get your thoughts. Let's say the table was created with GLOBAL_BLOOM and then we allow writes with BLOOM (which can create duplication across partitions. then we also allow later writes with GLOBAL_BLOOM. The current checks bring a lot of value. Just pointing out that its not fully bulletproof. wdyt
other than this, if we can think about metaClient creation, PR is ready
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java
Outdated
Show resolved
Hide resolved
...ent/hudi-flink-client/src/main/java/org/apache/hudi/index/state/FlinkInMemoryStateIndex.java
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java
Show resolved
Hide resolved
@vinothchandar thanks.
|
yes. but that takes the flexibilty away. I have seen users start with global bloom and then move to bloom, for perf reasons for eg. actually. option 2, seems like the right thing to me. but I can see that it's heavy weight. Ideally, hoodie.properties is a hudi file group, that can take updates or is versioned as well. that's also a large change. So, I am actually bit puzzled. :| Let me think more. |
@vinothchandar thanks , i think we can do as option2. |
|
yes. I am also wondering if we should log these to the metadata table in RFC-15. its a much better model in some sense, since it's fully self managed. Do you mind we hang on to this PR for now, until we land RFC-15 into master? you can checkout the |
thanks , i will checkout |
|
@vinothchandar : do you think we need to make this release-blocker? |
|
@lw309637554 @vinothchandar : can you folks get this to completion, its been open for a while. Would be nice to have this in. We might also add more documentation in fax or somewhere as to what switches are compatible. |
sorry for my late. Now metatable is ready , i will implement this use meta table. |
|
Closing this PR, in favor of new approach. |
Tips
What is the purpose of the pull request
write index type configs into
hoodie.propertiesduring dataset creation time. it makes sense for the index to be not changeable after dataset creationBrief change log
1、add indextype param in HoodieTableMetaClient.java initTableType method.
2、the table create client support add indextype
HoodieSparkSqlWriter.scala
BootstrapExecutor.java
DeltaSync.java
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.