-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-18096. Distcp: Sync moves filtered file to home directory rather than deleting. #3940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…er than deleting.
|
💔 -1 overall
This message was automatically generated. |
|
looks like yarn won't build. not this patch's fault |
| new DistCp(conf, diffBuilder.build()).execute(); | ||
|
|
||
| // Check the only qualified directory dir2 is there in target | ||
| assertTrue(dfs.exists(new Path(target, "dir2"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the ContractTestUtils asserts here as they report on failures, including with dir listing info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @steveloughran for the review, Have changed it to use ContractTestUtils
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
I'm not very familiar with the HDFS snapshot feature so can't help much with the reviewing here. cc @bshashikant @steveloughran : could you review this? |
saintstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't know much about snapshots...)
I see this code in DistCpSync
private void moveToTarget(DiffInfo[] diffs,
DistributedFileSystem targetFs) throws IOException {
// sort the diffs based on their target paths to make sure the parent
// directories are created first.
Arrays.sort(diffs, DiffInfo.targetComparator);
for (DiffInfo diff : diffs) {
if (diff.getTarget() != null) {
targetFs.mkdirs(diff.getTarget().getParent());
targetFs.rename(diff.getTmp(), diff.getTarget());
}
}
}
When the DiffInfo 'target' was non-null, its parent dir was the home directory?
Thanks.
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
going to assume the code does what you intend, as the tests imply all is good.
added a suggestion about test tuning, but its minor and you can skip if.
| dfs.createSnapshot(target, "s1"); | ||
|
|
||
| // Now do a rename to a filtered name on source. | ||
| dfs.rename(new Path(sourcePath, "dir1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check result for non zero or that the dest file exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, Will push if tests are green
|
@saintstack Lines 85 to 90 in efdec92
For rename entries it is made absolute here: hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java Lines 471 to 474 in efdec92
For normal delete there won't be any target, it would be always hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java Lines 465 to 468 in efdec92
In this particular case when using filters. The actual entry is a hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java Line 254 in efdec92
And when converting it to a hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java Lines 262 to 265 in efdec92
But since it is a delete entry the path isn't made absolute wrt target. So it stays like a relative path. like Then the code that you shared does the magic, it moves it... One example of target being set to hadoop/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java Lines 446 to 447 in efdec92
May be there could be sanity check in delete diff for target, but not very confident about that part, will explore sometime if there is any use case possible where it can be not-null & compat stuff. Further general optimisations as well are possible, like don't rename to tmp and then delete, directly delete(There is a reason why it is like that), that is something in my TODO list, will chase in future General Info: Filters are like quite used in DR setups, some time we don't want to copy some data to replica clusters. One example could be Trash data, many other use cases as well. Lemme know if it isn't still convincing.. |
saintstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the very helpful write-up (You might want to copy/paste it up onto the JIRA).
There looks to be good test coverage around these parts so I'd think if this change damaging to general flow, it would have surfaced as test failures.
+1
|
Thanks @steveloughran and @saintstack for the review! @ayushtkn could you merge this and backport to branch-3.3.2? I'll kick off 3.3.2 RC4 right after. |
|
🎊 +1 overall
This message was automatically generated. |
…er than deleting. (#3940). Contributed by Ayush Saxena. Reviewed-by: Steve Loughran <[email protected]> Reviewed-by: stack <[email protected]>
…er than deleting. (#3940). Contributed by Ayush Saxena. Reviewed-by: Steve Loughran <[email protected]> Reviewed-by: stack <[email protected]>
|
Thanx @saintstack @steveloughran and @sunchao for the help here. I have posted the comment on the jira as well |
…er than deleting. (apache#3940). Contributed by Ayush Saxena. Reviewed-by: Steve Loughran <[email protected]> Reviewed-by: stack <[email protected]>
Description of PR
Fixes creation of Delete Diff entry, the target in the delete diff should be null.
How was this patch tested?
Added a UT
For code changes: