-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-15891. provide Regex Based Mount Point In Inode Tree #2185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
This seems an interesting feature. Do we need to update the |
|
@liuml07 Thanks, this is a feature adopted inside our company for almost two years. The code is almost as same as our internal branch except I removed some refactored code to make it easier to review. Seemed like the rebase caused some UT failures. Let me fix the UTs and update the user guide. |
db45b76 to
65c151a
Compare
|
@templedf It will be great if you could help with the review, thanks. |
65c151a to
1fe5665
Compare
|
💔 -1 overall
This message was automatically generated. |
1fe5665 to
3ae248e
Compare
|
💔 -1 overall
This message was automatically generated. |
3ae248e to
44f3909
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Here're the UT failed. I didn't see them related. But I could double-check. Thanks [ERROR] Failures: |
44f3909 to
d3e5b83
Compare
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks complex. But the major change here is simple.
Before:
if (key.startsWith(mountTablePrefix)) {
// hundred line of code
}
After:
if (!key.startsWith(mountTablePrefix)) {
continue
}
// hundred line of code
|
The UTs failure are not related to the change. Here're some references: |
d3e5b83 to
c58b945
Compare
|
Just did a rebase and updated the PR. |
|
💔 -1 overall
This message was automatically generated. |
|
I will review it in a day or two, Thanks |
|
Thank you! let me check ViewFs.java. Function wise, this patch worked for MR jobs and HDFS use cases in our internal clusters.
|
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JohnZZGithub , Thanks a lot for the work.
I have quickly gone throw the patch and posted few comments. I will continue review if I can find more comments. It's a nice feature! Great work!
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ConfigUtil.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java
Outdated
Show resolved
Hide resolved
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java
Outdated
Show resolved
Hide resolved
...-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFileSystemBaseTest.java
Outdated
Show resolved
Hide resolved
...t/java/org/apache/hadoop/fs/viewfs/TestRegexMountPointResolvedDstPathReplaceInterceptor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not about this test, In general on inner cache disabling: There was a suggestion for avoiding explicitly user disabling it. If that not possible, you may want to have check while analyzing mount points itself that, if cache is enabled and using regex mounts, then you may want to fail fs initialization itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umamaheswararao This is a good point. Let me add some preconditions check.
BTW, now the inner cache assumes every filesystem is created while InodeTree is constructed and never changed. Do you think it's reasonable to change it to a concurrent hash map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing to clarify, I guess this config is for per schema level cache. Regex mount point is OK with it. Mount table is not good inner cache inside ViewFileSystem.java.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this issue and filed a JIRA for it. https://issues.apache.org/jira/browse/HADOOP-17028
This issue should be solved once implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize that there is a jira. This is great.
...oject/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemLinkRegex.java
Outdated
Show resolved
Hide resolved
809b9a1 to
31a68cd
Compare
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
31a68cd to
f9c61ef
Compare
375a438 to
8786ee8
Compare
|
💔 -1 overall
This message was automatically generated. |
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you @JohnZZGithub for the update. Patch looks awesome.
I have few minors to address before going.
Otherwise I am +1
| Path homeDir = null; | ||
| private boolean enableInnerCache = false; | ||
| private InnerCache cache; | ||
| private boolean evictCacheOnClose = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see any issue if we make it true? If no issues, we can simply clean it on close right instead of having another config?
Seems like this is an improvement to existing code. If you want, I am ok to file small JIRA and fix this cleanup thing.( I am assuming it's not necessarily needed with this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@umamaheswararao Thanks for the kindness. I didn't see a problem with making it true. Just mean to be more cautious, let me remove it.
| Assert.assertTrue( | ||
| resolveResult.targetFileSystem | ||
| instanceof TestRegexMountPointFileSystem); | ||
| Assert.assertTrue(resolveResult.resolvedPath.equals("/user/hadoop_user1")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still this asserts can use assertEquals method? Please use them all places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, let me clean them up.
| // Asserts | ||
| URI viewFsUri = new URI( | ||
| FsConstants.VIEWFS_SCHEME, CLUSTER_NAME, "/", null, null); | ||
| vfs = FileSystem.get(viewFsUri, config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small suggestion here:
Move above lines out of try block. Then use try(FileSystem vfs = = FileSystem.get(viewFsUri, config)){
}
This should close automatically after block. So, we can remove finally block below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the below comment is still valid?
| ### Understand the Difference | ||
|
|
||
| In the key-value based mount table, view file system treats every mount point as a partition. There's several file system APIs which will lead to operation on all partitions. E.g. there's an HDFS cluster with multiple mount. Users want to run “hadoop fs -put file viewfs://hdfs.namenode.apache.org/tmp/” cmd to copy data from local disk to our HDFS cluster. The cmd will trigger ViewFileSystem to call setVerifyChecksum() method which will initialize the file system for every mount point. | ||
| For a regex-base rule mount table entry, we couldn't know what's corresponding path until parsing. So the regex based mount table entry will be ignored on such cases. The file system (ChRootedFileSystem) will be created upon accessing. But the underlying file system will be cached by inner cache of ViewFileSystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a regex-base rule mount table entry, we couldn't know what's corresponding path until parsing.
Whatever we know should be added to mountPoints? So, that getMountPoints will return known fs-es? It may be a good idea to add Java doc on API level.
I am ok to have this in followup JIRA to cover all of this aspects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I guess this patch didn't add parsed fs to mount point yet. Maybe it's better when we modify the code and doc at the same time. Created https://issues.apache.org/jira/browse/HADOOP-17247 to track the issue. Does it make sense? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having separate JIRA. It make sense to me. !
|
💔 -1 overall
This message was automatically generated. |
6c9f1d1 to
4c543ea
Compare
|
💔 -1 overall
This message was automatically generated. |
|
Great work @JohnZZGithub. +1 pending jenkins clean report. |
|
💔 -1 overall
This message was automatically generated. |
|
@umamaheswararao Thanks a lot!
As for the failure UTs, I guess they are not related to the patch. |
| * @param mountTableName - the mountable name of the regex config item | ||
| * @param srcRegex - the src path regex expression that applies to this config | ||
| * @param targetStr - the string of target path | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method not used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I guess the next addLinkRegex is used but not this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
|
I think the following new lines added into this method. Thats the reason it is reporting longer method warning.. Probably we can extract this logics to separate methods. move the following code to a small check method. There are two places we are doing same checks. At end of the method we have some logics for handle no mount points code. IF we need we could extract that to handleNoMountPoints method or so. Check if this refactoring can make checkstyle happy and code can look cleaner. |
|
@umamaheswararao Sure. In fact, I did a major refactor in our internal repo to make it switch case-based. (https://github.com/apache/hadoop/pull/424/files#diff-69fd14ba63365b6a428bf7142c463990R511) However, I didn't put it here as it's harder to review. |
|
Thanks @JohnZZGithub I agree it's always better to have refactoring patch separated to make the reviews easier. |
|
@umamaheswararao Totally make sense, updated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
+1 on the latest patch. Thanks again for your great work @JohnZZGithub Test failures are unrelated. |
umamaheswararao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committed to trunk!
|
@umamaheswararao Thanks a lot! |
…Contributed by Zhenzhao Wang. Co-authored-by: Zhenzhao Wang <[email protected]> (cherry picked from commit 12a316c)
). Contributed by Zhenzhao Wang. Co-authored-by: Zhenzhao Wang <[email protected]> (cherry picked from commit 12a316c) Change-Id: Ibabb9d80bfcc634b529866b31f405573eb7f03f2 (cherry picked from commit 4102865)
https://issues.apache.org/jira/browse/HADOOP-15891