-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Log the new metadata location in commit. #4681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm not sure we want to force all future table operations to include a literal location (or have to return one). So we may want to have the logging in the operations (HiveTableOperations, HadoopTableOperations, ... ) themselves? |
Good point. But I think a table version string will be there even for a future catalog without a metatdata.json file. It can easily support that, with minor word change. like Hi @kbendick, what do you think from the perspective of rest API catalog? |
|
I agree with @RussellSpitzer that this should be done in the catalog, not generally. |
b8561bf to
626c981
Compare
| cleanupMetadataAndUnlock(commitStatus, newMetadataLocation, lockId, tableLevelMutex); | ||
| } | ||
|
|
||
| LOG.info("Committed to table {} with the new metadata location {}", fullName, newMetadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting this log in both line 278 and line 297, I put the log here.
For the current REST catalog, a file should be loggable. That said, I do agree that it's probably best not to force catalogs to have to always return a new metadata.json. From the point of view of the REST catalog, presently it wouldn't matter. That said, it looks like there would be quite a n number of places to log this if we did it at the catalog level. Essentially everywhere that For
TLDR: I agree that having the catalog do the logging would be the ideal way, but realistically that's a lot of additional places to add logs vs a small handful of implementations of For the RESTCatalog, |
|
Hi @RussellSpitzer, @rdblue @kbendick, made the change only to |
I'm good with just adding the log to But realistically, if people who use other table operations find need or value for this log, it can be added as a follow up (particularly by people who make more common use of those table operations). |
|
Added the support for It is still valuable though. You can connect the file with the job easily by looking at the log. |
Ah that makes sense. I'm good with this use case then. If other TableOperations decide it's needed, then we can add it. |
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
|
Thanks, @flyrain! |
|
Thanks all for the review. |
(cherry picked from commit 30b31a2)
This is pretty useful to figure out which version is actually committed to the catalog from the log, especially when we debug a catalog consistency or locking issue.
cc @aokolnychyi @RussellSpitzer @szehon-ho @karuppayya