Skip to content

HDFS-16531. Avoid setReplication writing an edit record if old replication equals the new value#4148

Merged
Hexiaoqiao merged 1 commit intoapache:trunkfrom
sodonnel:HDFS-16531
Apr 17, 2022
Merged

HDFS-16531. Avoid setReplication writing an edit record if old replication equals the new value#4148
Hexiaoqiao merged 1 commit intoapache:trunkfrom
sodonnel:HDFS-16531

Conversation

@sodonnel
Copy link
Contributor

@sodonnel sodonnel commented Apr 7, 2022

Description of PR

I recently came across a NN log where about 800k setRep calls were made, setting the replication from 3 to 3 - ie leaving it unchanged. Obviously the application should be fixed, but we could have an optimisation for this.

When the replication is unchanged in a case like this, we log an edit record, an audit log, and perform some quota checks etc. I believe we should still log an audit in these sort of cases, but we can skip all the checks and avoid writing an edit.

How was this patch tested?

Added a new test and validated the code paths were correct around writing and syncing edits or not with some log messages I then removed.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 20s trunk passed
+1 💚 compile 1m 30s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 1m 20s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 8s trunk passed
+1 💚 mvnsite 1m 37s trunk passed
+1 💚 javadoc 1m 10s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 33s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 34s trunk passed
+1 💚 shadedclient 23m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 18s the patch passed
+1 💚 compile 1m 20s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 1m 20s the patch passed
+1 💚 compile 1m 11s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 1m 11s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 51s the patch passed
+1 💚 mvnsite 1m 17s the patch passed
+1 💚 javadoc 0m 52s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 3m 19s the patch passed
+1 💚 shadedclient 22m 53s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 229m 9s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
348m 57s
Reason Tests
Failed junit tests hadoop.hdfs.TestRollingUpgrade
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4148/1/artifact/out/Dockerfile
GITHUB PR #4148
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 0a1a606e95d4 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 31e685f
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4148/1/testReport/
Max. process+thread count 3118 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4148/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch here. LGTM, +1 from my side.
The failed unit tests seem not related to this changes.

logAuditEvent(true, operationName, src);
}
return success;
logAuditEvent(status != FSDirAttrOp.SetRepStatus.INVALID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, if success, it logs a true.
Now if status == FSDirAttrOp.SetRepStatus.SUCCESS, it logs a false?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this does not changes the prior log except additional false log when setReplication failed.
a. if status == FSDirAttrOp.SetRepStatus.SUCCESS, it logs a true since status != FSDirAttrOp.SetRepStatus.INVALID;
b. if status == FSDirAttrOp.SetRepStatus.UNCHANGED, it also logs a true which is same as before.
c. if status == FSDirAttrOp.SetRepStatus.INVALID, it logs a false which is additional compare to before logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it should only log false if INVALID != SetRepStatus.INVALID -> this would return false. SUCCESS or UNCHANGED would log true. The change here, is that failures (INVALID) were previously not logged in the audits, which was a bug IMO. Other operations seem to audit failures in general.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, in most of the operations whenever there is an ACE we log audit as false and then we throw the exception, for the other exceptions like IO the exception was thrown directly without the operation getting logged

though i agree with @sodonnel and @Hexiaoqiao here, we can log the audit as false when the status == FSDirAttrOp.SetRepStatus.INVALID

@Hexiaoqiao Hexiaoqiao merged commit dbeeee0 into apache:trunk Apr 17, 2022
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @sodonnel for your contributions and thanks @jojochuang, @hemanthboyina for your coments.
@sodonnel Please feel free to cherry-pick to other active branches or let me know if we need backport. Thanks again.

asfgit pushed a commit that referenced this pull request Apr 19, 2022
…ation equals the new value (#4148). Contributed by Stephen O'Donnell.

(cherry picked from commit dbeeee0)
asfgit pushed a commit that referenced this pull request Apr 19, 2022
…ation equals the new value (#4148). Contributed by Stephen O'Donnell.

(cherry picked from commit dbeeee0)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
asfgit pushed a commit that referenced this pull request Apr 20, 2022
…d replication equals the new value (#4148). Contributed by Stephen O'Donnell."

This reverts commit dbeeee0.
asfgit pushed a commit that referenced this pull request Apr 20, 2022
…d replication equals the new value (#4148). Contributed by Stephen O'Donnell."

This reverts commit 8ae033d.
asfgit pushed a commit that referenced this pull request Apr 20, 2022
…d replication equals the new value (#4148). Contributed by Stephen O'Donnell."

This reverts commit 045d485.
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…ation equals the new value (apache#4148). Contributed by Stephen O'Donnell.
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…d replication equals the new value (apache#4148). Contributed by Stephen O'Donnell."

This reverts commit dbeeee0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments