[HUDI-544] Archived commits command code cleanup#1242
Conversation
|
@vinothchandar It's ok now, please have a review when you are free. |
There was a problem hiding this comment.
this is already overridden at the datasource and deltastreamer level.. Please revert this since it can affect all the old tables written into . ..
@n3nash can you advise here.. only reason to not change is all the old data at uber..
There was a problem hiding this comment.
@vinothchandar that's correct, we don't override this and all commits archived data at Uber is written to .hoodie/.
@hddong this change doesn't seem necessary but has serious concerns for older data written (even by other users who might not be overriding the archived folder name), could you please revert it ?
|
@n3nash @vinothchandar thanks for review. I had revert all which relate to |
There was a problem hiding this comment.
If remove this option, user can not use specified path.
There was a problem hiding this comment.
If remove this option, user can not use specified path.
IMO, it is not need here, archiveFolder is stored in .hoodie.
There was a problem hiding this comment.
This config is here to allow for users to be able to provide a full path of the files under the archive folder and read it, lets leave it here - Btw, these commands are being reworked in this PR - #1274.
There was a problem hiding this comment.
If leave it there, this pr will has nothing to be commited, just enhancing #1274
There was a problem hiding this comment.
I'm assuming HoodieCLI.getTableMetaClient().getArchivePath() returns the path with .hoodie ?
There was a problem hiding this comment.
public String getArchivePath() {
String archiveFolder = tableConfig.getArchivelogFolder();
if (archiveFolder.equals(HoodieTableConfig.DEFAULT_ARCHIVELOG_FOLDER)) {
return getMetaPath();
} else {
return getMetaPath() + "/" + archiveFolder;
}
}
There was a problem hiding this comment.
I'm assuming
HoodieCLI.getTableMetaClient().getArchivePath()returns the path with.hoodie?
Yes, it read from metadata and include .hoodie.
|
@n3nash addressed the comment. |
There was a problem hiding this comment.
This can affect all the old tables read archives.
There was a problem hiding this comment.
archivePath = new Path(metaClient.getArchivePath() + "/.commits_.archive*") is equivalent of new Path(basePath + "/.hoodie/.commits_.archive*"). the metaClient.getArchivePath() should return basePath + "/.hoodie" for all old tables.
@hmatu what concerns do you have ?
There was a problem hiding this comment.
@n3nash, they are different,
one: /table/.hoodie/archived/.commits_.archive*
another: /table/.hoodie/.commits_archive*
There was a problem hiding this comment.
The right way is add archiveFolderPattern to show archived commits command
like show archived commit stats dose.
There was a problem hiding this comment.
@hmatu @n3nash It return /table/.hoodie/.commits_archive* if old tables use DEFAULT '' and return /table/.hoodie/archived/.commits_.archive* if old tables use path archived to archive. So it will return the correct path with archive path stored in .hoodie.
On the other hand,archiveFolderPattern allow for users to be able to provide a full path of the files under the archive folder and read it.
There was a problem hiding this comment.
Thanks for your explain, but I still think it's a right way that add archiveFolderPattern to show archived commits command like show archived commit stats dose.
There was a problem hiding this comment.
@hmatu I think this is just a clean up of code to remove the hard-coded "archived" and providing it through the archive folder name, this is fine IMO since this does not break any backwards compatibility @hddong confirm ?
What you are referring to is also correct, but that should be another PR to allow for reading a path pattern for show archived commits just like others have.
There was a problem hiding this comment.
@n3nash, this pr aims to "Adjust the read and write path of archive". IMO, it is unnecessary to create an new pr.
There was a problem hiding this comment.
|
@hddong @n3nash @vinothchandar, Compare to changes:
These two archive paths are different:
So the better way is add |
|
@hddong please take a look at the last comment and squash all your commits please |
|
@n3nash @vinothchandar please review this agian. |
|
@n3nash is shepherding this. |
There was a problem hiding this comment.
@hddong I'm accepting and will merge after you create another ticket for us to add to the release notes that if someone was actually overriding HOODIE_ARCHIVELOG_FOLDER_PROP_NAME that was not being honored before and now will be honored for those cases it can break (since this is the right thing to do)
316dc5c to
ad68190
Compare
|
@n3nash squashed all commits to 1. I had do something for |
This is what I had commented. So this change will break if someone was overriding this prop name and expecting that path to be used (but it was actually not since it was never used). can you open a ticket for us to add this to the release notes and we can merge then. |
|
@n3nash : Sorry, I understand the reason, but I don't quite understand what the mean of open a ticket and how to open a ticket for you. Can you give me some detailed instructions? |
|
@hddong Please create a JIRA ticket here -> https://issues.apache.org/jira/projects/HUDI/issues and add the tag of documentation/release notes update. |
|
@n3nash : Had rebase this PR and create a new jira https://issues.apache.org/jira/browse/HUDI-1085. |
|
@hddong Sorry this fell through, please rebase to resolve conflicts and I will merge this asap |
|
@n3nash : had rebase this, please have a review when free. |
47ca9e0 to
1808c57
Compare
|
@n3nash : had rebase this again, please have a review when free. |
|
@hddong Extremely sorry, this fell through the crack, please rebase and I will merge this right after. |
|
@n3nash if you wish you can also rebase this yourself and push. See https://cwiki.apache.org/confluence/display/HUDI/Resources#Resources-PushingChangesToPRs for a how-to |
|
@n3nash: Had rebase this. |
* Archived commits command code cleanup
What is the purpose of the pull request
Now, archive path have two different default value: "archived" and "". It cause a bug.
Brief change log
Verify this pull request
This pull request is a trivial rework / code cleanup without any test coverage.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.