HDFS-16531. Avoid setReplication writing an edit record if old replication equals the new value#4148
Conversation
…ation equals the new value
|
💔 -1 overall
This message was automatically generated. |
Hexiaoqiao
left a comment
There was a problem hiding this comment.
Great catch here. LGTM, +1 from my side.
The failed unit tests seem not related to this changes.
| logAuditEvent(true, operationName, src); | ||
| } | ||
| return success; | ||
| logAuditEvent(status != FSDirAttrOp.SetRepStatus.INVALID, |
There was a problem hiding this comment.
Prior to this change, if success, it logs a true.
Now if status == FSDirAttrOp.SetRepStatus.SUCCESS, it logs a false?
There was a problem hiding this comment.
IMO, this does not changes the prior log except additional false log when setReplication failed.
a. if status == FSDirAttrOp.SetRepStatus.SUCCESS, it logs a true since status != FSDirAttrOp.SetRepStatus.INVALID;
b. if status == FSDirAttrOp.SetRepStatus.UNCHANGED, it also logs a true which is same as before.
c. if status == FSDirAttrOp.SetRepStatus.INVALID, it logs a false which is additional compare to before logic.
There was a problem hiding this comment.
Yea, it should only log false if INVALID != SetRepStatus.INVALID -> this would return false. SUCCESS or UNCHANGED would log true. The change here, is that failures (INVALID) were previously not logged in the audits, which was a bug IMO. Other operations seem to audit failures in general.
There was a problem hiding this comment.
IMO, in most of the operations whenever there is an ACE we log audit as false and then we throw the exception, for the other exceptions like IO the exception was thrown directly without the operation getting logged
though i agree with @sodonnel and @Hexiaoqiao here, we can log the audit as false when the status == FSDirAttrOp.SetRepStatus.INVALID
|
Committed to trunk. Thanks @sodonnel for your contributions and thanks @jojochuang, @hemanthboyina for your coments. |
…ation equals the new value (apache#4148). Contributed by Stephen O'Donnell.
…d replication equals the new value (apache#4148). Contributed by Stephen O'Donnell." This reverts commit dbeeee0.
Description of PR
I recently came across a NN log where about 800k setRep calls were made, setting the replication from 3 to 3 - ie leaving it unchanged. Obviously the application should be fixed, but we could have an optimisation for this.
When the replication is unchanged in a case like this, we log an edit record, an audit log, and perform some quota checks etc. I believe we should still log an audit in these sort of cases, but we can skip all the checks and avoid writing an edit.
How was this patch tested?
Added a new test and validated the code paths were correct around writing and syncing edits or not with some log messages I then removed.