-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: handle uncertain catalog state for glue #2402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. @jackye1995 or @yyanyy, can you confirm that the integration tests pass?
Thank you for the quick review! Yes I have ran the integration tests locally for both new and existing glue tests and confirmed that they all succeeded. |
|
@RussellSpitzer, could you check? |
|
Looks good to me as well! Sorry the tests weren't easier to abstract, I figure with catalogs having such different operations it is probably better that we just duplicate :/ |
| try { | ||
| persistGlueTable(glueTable, properties); | ||
| commitStatus = CommitStatus.SUCCESS; | ||
| } catch (Throwable persistFailure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think this can be
SdkExceptioninstead of all throwables, becausepersistGlueTableis only a call to Glue. If it is not that exception type, it is guaranteed to be a failure. - can we avoid nested try and directly determine this in the catch block below? We are already catching that exception at L137 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback!
For 1, I'm not confident if any type of network issue would end up as SdkException, so I'd prefer to maintain a list of exceptions that won't go through checkCommitStatus than a list that we will just to be on the safe side.
For 2, sounds good, I'll move the catch logic to L137 but I'll convert it to a Throwable to handle 1. This essentially means to not check commit status for ConcurrentModificationException and AlreadyExistsException which I think should be minimum risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catching Throwable always sound very dangerous to me, because Error indicates serious problems that a reasonable application should not try to catch, and I would much prefer just catching RuntimeException or at least just Exception. Was there any particular reason for HiveTableOperations to catch Throwable? @RussellSpitzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is less important now that we start out with the commit state set to unknown, but in the original design we basically started out assuming that the commit had succeeded and switched a flag to indicate that it had failed. We can probably have this just be runtime exception now since now the logic is basically for deciding when to retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I was thinking if it's a Error we will very unlikely be able have a success commit before that, so it will be rethrown anyway, but on the other hand we probably want to make sure the process to die immediately. I'll update L137 to catch runtime exception.
| Table glueTable = getGlueTable(); | ||
| checkMetadataLocation(glueTable, base); | ||
| Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line to remove
| } | ||
| persistGlueTable(glueTable, properties); | ||
| commitStatus = CommitStatus.SUCCESS; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line to remove
|
Thanks everyone for the quick review! The test failed but I think it may be transient, since I think I built the same code locally before updating and all tests succeeded. I've triggered another round of test. If there is no more comment/concern before tomorrow I'll probably merge this change. Thanks again! |
|
Merged as no further comment after last Thursday and multiple approvals were in place. Thank you again everyone for reviewing so quickly! |
checkCommitStatus()fromHiveTableOperationstoBaseMetastoreTableOperationsto allow code reuse.GlueCatalogCommitFailureTestare heavily based on (#2317) Stop removal of files when catalog state is uncertain - HiveCatalog #2328; wanted to abstract them but since both classes extends a base class, and the shared code are not a lot, decided to duplicate the logic instead.