Skip to content

Conversation

@lw309637554
Copy link
Contributor

Tips

What is the purpose of the pull request

write index type configs into hoodie.properties during dataset creation time. it makes sense for the index to be not changeable after dataset creation

Brief change log

1、add indextype param in HoodieTableMetaClient.java initTableType method.
2、the table create client support add indextype
HoodieSparkSqlWriter.scala
BootstrapExecutor.java
DeltaSync.java

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@lw309637554
Copy link
Contributor Author

@vinothchandar @n3nash can you help to review? Thanks

@vinothchandar vinothchandar self-assigned this Oct 4, 2020
Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lw309637554 Change itself looks good to me. However, wondering if we should also

  • Add some check to throw an error if the index type is different in hoodie.properties than it's configured. some changes are compatible for e.g using GLOBAL_BLOOM and then switching to BLOOM/SIMPLE . but reverse is not.
  • I think it's high time we introduce a builder pattern init the table properties. those overloaded initXX are hard to read. If interested, we can do that in a separate PR.
  • Should we also add something to upgrade downgrade logic to persist this for existing datasets.

@lw309637554
Copy link
Contributor Author

@lw309637554 Change itself looks good to me. However, wondering if we should also

  • Add some check to throw an error if the index type is different in hoodie.properties than it's configured. some changes are compatible for e.g using GLOBAL_BLOOM and then switching to BLOOM/SIMPLE . but reverse is not.
  • I think it's high time we introduce a builder pattern init the table properties. those overloaded initXX are hard to read. If interested, we can do that in a separate PR.
  • Should we also add something to upgrade downgrade logic to persist this for existing datasets.

Thanks , i will think about

@lw309637554 lw309637554 closed this Oct 4, 2020
@lw309637554 lw309637554 reopened this Oct 4, 2020
@lw309637554
Copy link
Contributor Author

lw309637554 commented Oct 4, 2020

@lw309637554 Change itself looks good to me. However, wondering if we should also
Thanks @vinothchandar

  • Add some check to throw an error if the index type is different in hoodie.properties than it's configured. some changes are compatible for e.g using GLOBAL_BLOOM and then switching to BLOOM/SIMPLE . but reverse is not.

OK, i will add it.

  • I think it's high time we introduce a builder pattern init the table properties. those overloaded initXX are hard to read. If interested, we can do that in a separate PR.

OK, i interested.

  • Should we also add something to upgrade downgrade logic to persist this for existing datasets.

OK, i will try it.

@lw309637554 lw309637554 force-pushed the HUDI-37 branch 2 times, most recently from bd1865d to 3ef10a3 Compare October 6, 2020 11:12
@lw309637554
Copy link
Contributor Author

lw309637554 commented Oct 6, 2020

@lw309637554 Change itself looks good to me. However, wondering if we should also

  • Add some check to throw an error if the index type is different in hoodie.properties than it's configured. some changes are compatible for e.g using GLOBAL_BLOOM and then switching to BLOOM/SIMPLE . but reverse is not.

add compatible check in AbstractHoodieClient.createMetaClient().

  • I think it's high time we introduce a builder pattern init the table properties. those overloaded initXX are hard to read. If interested, we can do that in a separate PR.

open a new issue, will land it in https://issues.apache.org/jira/browse/HUDI-1315

  • Should we also add something to upgrade downgrade logic to persist this for existing datasets.

have added in AbstractUpgradeDowngrade.createUpdatedFile()

@lw309637554 lw309637554 force-pushed the HUDI-37 branch 2 times, most recently from f70cdc8 to a69608f Compare October 6, 2020 19:11
@vinothchandar
Copy link
Member

@lw309637554 is this ready for re-review? please lmk

@lw309637554
Copy link
Contributor Author

lw309637554 commented Oct 13, 2020

@lw309637554 is this ready for re-review? please lmk

@vinothchandar yeah, i think it is ready. Completeness of compatibility checks in AbstractHoodieClient.createMetaClient() need your suggestion.

@lw309637554
Copy link
Contributor Author

@vinothchandar please help to review again ,thanks.

@vinothchandar
Copy link
Member

@lw309637554 will do in the next 1-2 days. thanks for your patience

@lw309637554
Copy link
Contributor Author

@lw309637554 will do in the next 1-2 days. thanks for your patience

thanks

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Mostly code/structure/use-of-class related.
but 1 main comment around some missing cases.

@lw309637554
Copy link
Contributor Author

Left some comments. We are close. Some issues around index creation at various places, needs to be taken care of though

@vinothchandar yes , we are so close. have resolved as your suggestion. can you help to review again. Thanks so much.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of one more scenario. @lw309637554 Love to get your thoughts. Let's say the table was created with GLOBAL_BLOOM and then we allow writes with BLOOM (which can create duplication across partitions. then we also allow later writes with GLOBAL_BLOOM. The current checks bring a lot of value. Just pointing out that its not fully bulletproof. wdyt

other than this, if we can think about metaClient creation, PR is ready

@lw309637554
Copy link
Contributor Author

I thought of one more scenario. @lw309637554 Love to get your thoughts. Let's say the table was created with GLOBAL_BLOOM and then we allow writes with BLOOM (which can create duplication across partitions. then we also allow later writes with GLOBAL_BLOOM. The current checks bring a lot of value. Just pointing out that its not fully bulletproof. wdyt

other than this, if we can think about metaClient creation, PR is ready

@vinothchandar thanks.

  1. about the GLOBAL_BLOOM-> BLOOM -> GLOBAL_BLOOM scenario have two option
    option 1: check GLOBAL_BLOOM incompatible with BLOOM strict. It will be concise
    option 2: update the index in hoodie properties when the index changed. it will be too heavy
    wdyt?
  2. about metaclient creation have resolved

@vinothchandar
Copy link
Member

option 1: check GLOBAL_BLOOM incompatible with BLOOM strict. It will be concise

yes. but that takes the flexibilty away. I have seen users start with global bloom and then move to bloom, for perf reasons for eg.

actually. option 2, seems like the right thing to me. but I can see that it's heavy weight. Ideally, hoodie.properties is a hudi file group, that can take updates or is versioned as well. that's also a large change.

So, I am actually bit puzzled. :| Let me think more.
if you have any other way out of this, please let me know as well .

@lw309637554
Copy link
Contributor Author

option 1: check GLOBAL_BLOOM incompatible with BLOOM strict. It will be concise

yes. but that takes the flexibilty away. I have seen users start with global bloom and then move to bloom, for perf reasons for eg.

actually. option 2, seems like the right thing to me. but I can see that it's heavy weight. Ideally, hoodie.properties is a hudi file group, that can take updates or is versioned as well. that's also a large change.

So, I am actually bit puzzled. :| Let me think more.
if you have any other way out of this, please let me know as well .

@vinothchandar thanks , i think we can do as option2.
just like AbstractUpgradeDowngrade update the properties. but at index persist scenario, we can retain the old properties endwith the version of timestamp

@vinothchandar
Copy link
Member

yes. I am also wondering if we should log these to the metadata table in RFC-15. its a much better model in some sense, since it's fully self managed. Do you mind we hang on to this PR for now, until we land RFC-15 into master?

you can checkout the rfc-15 feature branch for now, if you are curious

@lw309637554
Copy link
Contributor Author

yes. I am also wondering if we should log these to the metadata table in RFC-15. its a much better model in some sense, since it's fully self managed. Do you mind we hang on to this PR for now, until we land RFC-15 into master?

you can checkout the rfc-15 feature branch for now, if you are curious

thanks , i will checkout rfc-15, and try to land this feature on rfc-15 branch, wdyt?

@nsivabalan
Copy link
Contributor

@vinothchandar : do you think we need to make this release-blocker?

@nsivabalan
Copy link
Contributor

@lw309637554 @vinothchandar : can you folks get this to completion, its been open for a while. Would be nice to have this in. We might also add more documentation in fax or somewhere as to what switches are compatible.

@nsivabalan nsivabalan added priority:high Significant impact; potential bugs priority:medium Moderate impact; usability gaps and removed priority:high Significant impact; potential bugs labels Feb 12, 2021
@lw309637554
Copy link
Contributor Author

@lw309637554 @vinothchandar : can you folks get this to completion, its been open for a while. Would be nice to have this in. We might also add more documentation in fax or somewhere as to what switches are compatible.

sorry for my late. Now metatable is ready , i will implement this use meta table.

@lw309637554 lw309637554 changed the title [HUDI-37] Persist the HoodieIndex type in the hoodie.properties file [HUDI-37][WIP] Persist the HoodieIndex type in the hoodie.properties file Feb 28, 2021
@vinothchandar vinothchandar added the status:in-progress Work in progress label Apr 15, 2021
@vinothchandar
Copy link
Member

Closing this PR, in favor of new approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:medium Moderate impact; usability gaps status:in-progress Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants