-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Throw an exception if both catalog type and catalog-impl are set #3162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
For me if someone sets both What do you think? |
I thought about it for a bit and I think it is actually a good idea as well to throw an exception, it has an advantage of being explicit to user and thus, the user is educated on how these config work. I will update the logic shortly. Thanks for the feedback! |
|
@pvary I have changed the logic a bit to throw an exception instead. Please take a look at it and let me know what you think. |
site/docs/hive.md
Outdated
| | --------------------------------------------- | ------------------------------------------------------ | | ||
| | iceberg.catalog.<catalog_name\>.type | type of catalog: `hive` or `hadoop` | | ||
| | iceberg.catalog.<catalog_name\>.catalog-impl | catalog implementation, must not be null if type is null | | ||
| | iceberg.catalog.<catalog_name\>.type | type of catalog: `hive`, `hadoop` or empty if `catalog-impl` will be set. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're throwing an exception, I think this change can be rolled back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is necessary since this is the original intention to leave the type empty in case the catalog-impl will be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking notes:
- I agree that this can mostly be rolled back. I would say that
typecould be "hive or hadoop, or left unset if using a custom catalog".
|
This brings up a larger issue (that is outside of the scope of this PR) that @kainoa21 ecently brought up recently on Slack and elsewhere, that we also experience when using In the case he brought up, he wanted to be able to use He felt that possibly it was a bit of a leaky abstraction in some places. His feeling was that maybe there should be a As this seems to be an offshoot of the same problem, tagging him and linking the issue for future reference: #3044 This is outside of the scope of this PR and I don't mean to block this PR, but this might be something we want to consider during V3 planning / as part of a larger scope of things as we do seem to be encountering some issues with what I'll call the We can continue this conversation elsewhere, sorry to thread-jack / PR-jack the discussion! |
| } else { | ||
| String name = catalogName == null ? ICEBERG_DEFAULT_CATALOG_NAME : catalogName; | ||
|
|
||
| String catalogImpl = conf.get(InputFormatConfig.catalogPropertyConfigKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of doing the check here it's simpler to do the check in addCatalogPropertiesIfMissing before we set CatalogUtil.ICEBERG_CATALOG_TYPE to something. By doing that, you do not need to get the actual impl like this and you can just check catalogProperties.containsKey(CatalogProperties.CATALOG_IMPL). This also allows compatibility with legacy catalog impl Hadoop config key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! I will change this then and add one more test for legacy configs
|
I also agree that this should throw an exception instead of setting null. Could you update the title to reflect that? I think we should also do a similar check in Spark and Flink code path to make sure we do not have the same issue, could you also add those checks? |
I could do that, but would it make sense to keep this PR focused on Hive and raise a new one for Spark and Flink? |
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omarsmak, can the check be done in CatalogUtil instead of here? That way it would apply everywhere and not just for Hive.
24680bc to
819374c
Compare
819374c to
947a952
Compare
|
@rdblue I have moved the logic to |
947a952 to
45a301b
Compare
|
@rdblue please take a look at it once again, I hope things now are in place :) |
|
Some tests are failing, I will need to check tomorrow why they are failing |
45a301b to
79f0d46
Compare
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now! I'll merge when tests are passing.
79f0d46 to
0a3666c
Compare
0a3666c to
90ecfdd
Compare
|
@rdblue CI now looks green. |
|
Thanks, @omarsmak! |
Currently if the user has set both configs in Hive
iceberg.catalog.<catalog_name\>.type=hive|hadoopandiceberg.catalog.<catalog_name\>.catalog-impl=CustomCatalog, Iceberg will regardless use thetypeto perform some operations inHiveIcebergMetaHookthat may introduce inconsistencies that are incompatible with the set custom catalog. For example, if type is set tohiveand we havecatalog-implset toGlueCatalog,HiveIcebergMetaHookwill perform operations based onhivecatalog which are incompatible with the custom catalogGlueCatalog.In order to mediate the issue, in this PR I am returning
nulliniceberg.catalog.<catalog_name\>.typein case we haveiceberg.catalog.<catalog_name\>.catalog-implis set in order to avoid such cases as described above.Note: The behavior in regards to legacy configs is remained unchanged.