Skip to content

Conversation

@ayushtkn
Copy link
Member

@ayushtkn ayushtkn commented Jan 27, 2022

Description of PR

Fixes creation of Delete Diff entry, the target in the delete diff should be null.

How was this patch tested?

Added a UT

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
-1 ❌ mvninstall 40m 14s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 24s trunk passed
+1 💚 mvnsite 0m 36s trunk passed
+1 💚 javadoc 0m 28s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 26s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 56s trunk passed
+1 💚 shadedclient 26m 38s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 29s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 16s the patch passed
+1 💚 mvnsite 0m 27s the patch passed
+1 💚 javadoc 0m 22s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 20s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 2s the patch passed
+1 💚 shadedclient 27m 3s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 25m 47s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 33s The patch does not generate ASF License warnings.
130m 20s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/1/artifact/out/Dockerfile
GITHUB PR #3940
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 5655abca203d 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f759625
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/1/testReport/
Max. process+thread count 719 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

looks like yarn won't build. not this patch's fault

[INFO] 
[INFO] --- frontend-maven-plugin:1.11.2:yarn (yarn install) @ hadoop-yarn-applications-catalog-webapp ---
[INFO] testFailureIgnore property is ignored in non test phases
[INFO] Running 'yarn ' in /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-3940/ubuntu-focal/src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target
[INFO] yarn install v1.7.0
[INFO] info No lockfile found.
[INFO] [1/4] Resolving packages...
[INFO] [2/4] Fetching packages...
[INFO] error [email protected]: The engine "node" is incompatible with this module. Expected version ">=10".
[INFO] error Found incompatible module

new DistCp(conf, diffBuilder.build()).execute();

// Check the only qualified directory dir2 is there in target
assertTrue(dfs.exists(new Path(target, "dir2")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the ContractTestUtils asserts here as they report on failures, including with dir listing info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @steveloughran for the review, Have changed it to use ContractTestUtils

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 18s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 13s trunk passed
+1 💚 compile 0m 34s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 36s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 50s trunk passed
+1 💚 shadedclient 20m 33s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 27s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 25s the patch passed
+1 💚 compile 0m 22s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 22s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 24s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 17s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 51s the patch passed
+1 💚 shadedclient 20m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 38m 7s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
120m 35s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/2/artifact/out/Dockerfile
GITHUB PR #3940
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux fa2df03d1862 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 834ea60
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/2/testReport/
Max. process+thread count 549 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 23s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 22s trunk passed
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 30s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 26s trunk passed
+1 💚 mvnsite 0m 36s trunk passed
+1 💚 javadoc 0m 31s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 28s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 57s trunk passed
+1 💚 shadedclient 27m 2s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 31s the patch passed
+1 💚 compile 0m 30s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 30s the patch passed
+1 💚 compile 0m 25s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 28s the patch passed
+1 💚 javadoc 0m 22s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 21s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 1m 5s the patch passed
+1 💚 shadedclient 26m 9s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 42m 5s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
146m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/3/artifact/out/Dockerfile
GITHUB PR #3940
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 246f98f9a681 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 834ea60
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/3/testReport/
Max. process+thread count 722 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@sunchao
Copy link
Member

sunchao commented Feb 7, 2022

I'm not very familiar with the HDFS snapshot feature so can't help much with the reviewing here. cc @bshashikant @steveloughran : could you review this?

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't know much about snapshots...)

I see this code in DistCpSync

private void moveToTarget(DiffInfo[] diffs,
     DistributedFileSystem targetFs) throws IOException {
   // sort the diffs based on their target paths to make sure the parent
   // directories are created first.
   Arrays.sort(diffs, DiffInfo.targetComparator);
   for (DiffInfo diff : diffs) {
     if (diff.getTarget() != null) {
       targetFs.mkdirs(diff.getTarget().getParent());
       targetFs.rename(diff.getTmp(), diff.getTarget());
     }
   }
 }

When the DiffInfo 'target' was non-null, its parent dir was the home directory?

Thanks.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
going to assume the code does what you intend, as the tests imply all is good.

added a suggestion about test tuning, but its minor and you can skip if.

dfs.createSnapshot(target, "s1");

// Now do a rename to a filtered name on source.
dfs.rename(new Path(sourcePath, "dir1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check result for non zero or that the dest file exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, Will push if tests are green

@ayushtkn
Copy link
Member Author

ayushtkn commented Feb 10, 2022

@saintstack
The path is actually relative:

/**
* The relative path (related to the snapshot root) of 1) the file/directory
* where changes have happened, or 2) the source file/dir of a rename op.
*/
private final byte[] sourcePath;
private final byte[] targetPath;

For rename entries it is made absolute here:

for (DiffInfo diff : diffMap.get(SnapshotDiffReport.DiffType.RENAME)) {
Path source = new Path(targetDir, diff.getSource());
Path target = new Path(targetDir, diff.getTarget());
renameAndDeleteDiff.add(new DiffInfo(source, target, diff.getType()));

For normal delete there won't be any target, it would be always null, so it is added just like that in normal cases.

for (DiffInfo diff : diffMap.get(SnapshotDiffReport.DiffType.DELETE)) {
Path source = new Path(targetDir, diff.getSource());
renameAndDeleteDiff.add(new DiffInfo(source, diff.getTarget(),
diff.getType()));

In this particular case when using filters.

The actual entry is a RENAME entry which has target. Rename has to have a target. So, it takes this else block

} else if (dt == SnapshotDiffReport.DiffType.RENAME) {

And when converting it to a DELETE entry, it even adds the target.

list = diffMap.get(SnapshotDiffReport.DiffType.DELETE);
DiffInfo info = new DiffInfo(source, target,
SnapshotDiffReport.DiffType.DELETE);
list.add(info);

But since it is a delete entry the path isn't made absolute wrt target. So it stays like a relative path. like filterDir1 and since it doesn't start with / and the normal logic by default it gets resolved to home directory.

Then the code that you shared does the magic, it moves it...

One example of target being set to null :

renameAndDeleteDiff.add(new DiffInfo(source, null,
SnapshotDiffReport.DiffType.DELETE));

May be there could be sanity check in delete diff for target, but not very confident about that part, will explore sometime if there is any use case possible where it can be not-null & compat stuff.

Further general optimisations as well are possible, like don't rename to tmp and then delete, directly delete(There is a reason why it is like that), that is something in my TODO list, will chase in future

General Info: Filters are like quite used in DR setups, some time we don't want to copy some data to replica clusters. One example could be Trash data, many other use cases as well.

Lemme know if it isn't still convincing..

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the very helpful write-up (You might want to copy/paste it up onto the JIRA).

There looks to be good test coverage around these parts so I'd think if this change damaging to general flow, it would have surfaced as test failures.

+1

@sunchao
Copy link
Member

sunchao commented Feb 10, 2022

Thanks @steveloughran and @saintstack for the review! @ayushtkn could you merge this and backport to branch-3.3.2? I'll kick off 3.3.2 RC4 right after.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 16m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 88m 54s trunk passed
+1 💚 compile 0m 34s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 31s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 27s trunk passed
+1 💚 mvnsite 0m 58s trunk passed
+1 💚 javadoc 0m 30s trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 29s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 50s trunk passed
+1 💚 shadedclient 20m 10s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 27s the patch passed
+1 💚 compile 0m 26s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 26s the patch passed
+1 💚 compile 0m 23s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 17s the patch passed
+1 💚 mvnsite 0m 26s the patch passed
+1 💚 javadoc 0m 21s the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 0m 50s the patch passed
+1 💚 shadedclient 19m 54s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 39m 36s hadoop-distcp in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
195m 16s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/4/artifact/out/Dockerfile
GITHUB PR #3940
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 27a70ca30267 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 25cf63c
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/4/testReport/
Max. process+thread count 674 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3940/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@ayushtkn ayushtkn merged commit fe583c4 into apache:trunk Feb 10, 2022
asfgit pushed a commit that referenced this pull request Feb 10, 2022
…er than deleting. (#3940). Contributed by Ayush Saxena.

Reviewed-by: Steve Loughran <[email protected]>
Reviewed-by: stack <[email protected]>
asfgit pushed a commit that referenced this pull request Feb 10, 2022
…er than deleting. (#3940). Contributed by Ayush Saxena.

Reviewed-by: Steve Loughran <[email protected]>
Reviewed-by: stack <[email protected]>
@ayushtkn
Copy link
Member Author

Thanx @saintstack @steveloughran and @sunchao for the help here.

I have posted the comment on the jira as well

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…er than deleting. (apache#3940). Contributed by Ayush Saxena.

Reviewed-by: Steve Loughran <[email protected]>
Reviewed-by: stack <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants