Skip to content

Conversation

@kuenishi
Copy link
Contributor

@kuenishi kuenishi commented Jan 19, 2022

What changes were proposed in this pull request?

Frequent errors such as NoSuchBucket or NoSuchKey must not be written as errors, while they are. They should be in audit logs in principle, because the system is not wrong, not knowing how to handle them. Otherwise, we may miss important errors like NullPointerException that does not appear in healthy system. To make sure that we can keep eyes on system health than access by applications, errors that stem from applications must not be printed in normal log files.

Update: Internal Errors should be written as ERROR to system log. So, the log in S3ErrorTable will be:

  • if it's internal error, it's unexpected, then write as ERROR.
  • otherwise, it's known error in S3 API specification, then write as DEBUG.

For auditing purpose, I files another issue as HDDS-6207.

What is the link to the Apache JIRA

HDDS-6206

How was this patch tested?

Unit tests by CI

@swagle swagle requested a review from errose28 January 24, 2022 18:54
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kuenishi for the patch. I think it's a good change to have. Can you please make the debug-level log call conditional?

…3/exception/S3ErrorTable.java

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
@kuenishi
Copy link
Contributor Author

kuenishi commented Feb 3, 2022

@adoroszlai Thanks for review. I committed your proposal.

@adoroszlai adoroszlai merged commit e47b6f0 into apache:master Feb 3, 2022
@adoroszlai
Copy link
Contributor

Thanks @kuenishi for updating the patch.

@kuenishi kuenishi deleted the debug-for-app-error branch February 3, 2022 07:43
errose28 added a commit to errose28/ozone that referenced this pull request Feb 4, 2022
* master: (27 commits)
  HDDS-6206. Application errors must not flood system log (apache#3001)
  HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (apache#2992)
  HDDS-5529. For any IOexception from @REPLicated method we should throw it (apache#2788)
  HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (apache#3005)
  HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (apache#3029)
  HDDS-6227. Test helpers should observe naming conditions (apache#3020)
  HDDS-6205. Add CLI command to display the latest Replication Manager report (apache#3013)
  HDDS-6192. feature/Observability.md translated to Chinese (apache#2994)
  HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (apache#3014)
  HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (apache#3007)
  HDDS-6177. Extend container info command to include replica details  (apache#2995)
  HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (apache#2987)
  HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (apache#3015)
  HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (apache#3011)
  HDDS-6203. CleanUp incomplete gz files during Container move (apache#3000)
  HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (apache#2983)
  HDDS-6109. Preserve the underlying exception raised in client lib. (apache#2989)
  HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (apache#2972)
  HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (apache#2998)
  HDDS-6163. Fix PATH in docker image (apache#2967)
  ...

Conflicts:
    hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
errose28 added a commit to errose28/ozone that referenced this pull request Feb 7, 2022
* master: (27 commits)
  HDDS-6206. Application errors must not flood system log (apache#3001)
  HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (apache#2992)
  HDDS-5529. For any IOexception from @REPLicated method we should throw it (apache#2788)
  HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (apache#3005)
  HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (apache#3029)
  HDDS-6227. Test helpers should observe naming conditions (apache#3020)
  HDDS-6205. Add CLI command to display the latest Replication Manager report (apache#3013)
  HDDS-6192. feature/Observability.md translated to Chinese (apache#2994)
  HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (apache#3014)
  HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (apache#3007)
  HDDS-6177. Extend container info command to include replica details  (apache#2995)
  HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (apache#2987)
  HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (apache#3015)
  HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (apache#3011)
  HDDS-6203. CleanUp incomplete gz files during Container move (apache#3000)
  HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (apache#2983)
  HDDS-6109. Preserve the underlying exception raised in client lib. (apache#2989)
  HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (apache#2972)
  HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (apache#2998)
  HDDS-6163. Fix PATH in docker image (apache#2967)
  ...
arp7 pushed a commit that referenced this pull request Feb 17, 2022
…ansport support (#3074)

* HDDS-6149. Remove unused keytabs (#2960)

* HDDS-6094. Some unit tests are skipped due to using JUnit4 runner (#2909)

* HDDS-6075. OzoneConfiguration constructor overrides input configuration keys. (#2921)

* HDDS-4177. SCM Container DB bootstrap on Recon startup (#2942)

* HDDS-6086. Compute MD5MD5CRC file checksum using chunk checksums from DataNodes (#2919)

* HDDS-6148. Validate ContainerBalancerConfiguration before start ContainerBalancer (#2957)

* HDDS-6161. SCM StateMachine failing to reinitialize doesn't terminate the process. (#2971)

* HDDS-6134. Move replication-specific config to ReplicationServer (#2943)

* HDDS-4010. S3G startup fails when multiple service ids are configured. (#2976)

* HDDS-6170. Add metrics to replication manager to track container health states (#2975)

* HDDS-3231. Cleanup KeyManagerImpl (#2961)

* HDDS-5927. Improve defaults in ContainerBalancerConfiguration (#2892)

* HDDS-6157. More consistent synchronization in InputStreams (#2965)

* HDDS-4743. [FSO] Add FSO variant of ITestOzoneContractDistcp. (#2980)

* HDDS-6114. Intermittent error due to Failed to init RocksDB (#2947)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. (#2981)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient.

* HDDS-5740. Enable ratis by default for SCM. (#2637)

* HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM. (#2991)

* HDDS-4190. Intermittent failure in TestOMVolumeSetOwnerRequest and TestOMVolumeSetQuotaRequest. (#2982)

* HDDS-6120. Compute block checksum using chunk checksums (#2930)

* HDDS-6147. Add ability in OM to get limited delta updates (#2956)

* HDDS-6195. Remove unused jmh-core dependency. (#2997)

* HDDS-6167. Update ozone-runner version to 20211202-1 (#2969)

* HDDS-6171. Create an API to change Bucket Owner (#2988)

* HDDS-6163. Fix PATH in docker image (#2967)

* HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (#2998)

* HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (#2972)

* HDDS-6109. Preserve the underlying exception raised in client lib. (#2989)

* HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (#2983)

* HDDS-6203. CleanUp incomplete gz files during Container move (#3000)

* HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (#3011)

* HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (#3015)

* HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (#2987)

* HDDS-6177. Extend container info command to include replica details  (#2995)

* HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (#3007)

* HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (#3014)

* HDDS-6192. feature/Observability.md translated to Chinese (#2994)

* HDDS-6205. Add CLI command to display the latest Replication Manager report (#3013)

* HDDS-6227. Test helpers should observe naming conditions (#3020)

* HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (#3029)

* HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (#3005)

* HDDS-5529. For any IOexception from @REPLicated method we should throw it (#2788)

* HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (#2992)

* HDDS-6206. Application errors must not flood system log (#3001)

* HDDS-6245. Add BucketLayout logging to Audit Logs (#3040)

* HDDS-6238 Reduce memory requirements for list keys. (#3032)

* HDDS-2919. Intermittent failure in TestRDBStore (#3028)

* HDDS-6253. Unnecessary duplicate smoketest after defaulting to FSO (#3036)

* HDDS-6204. Cleanup handling malformed authorization header (#2999)

* HDDS-6169. Selective checks: skip junit tests on ozone-runner image update (#2974)

* HDDS-6270. Use a dedicated file instead of /etc/passwd for xcompat acceptance test (#3050)

* HDDS-6273. Amend doc SecuringTDE.md (#3047)

* HDDS-6140. Selective checks: skip unit check for integration-test changes (#2948)

* HDDS-6215. Recon get limited delta updates from OM (#3009)

* HDDS-6125. Recon get limited delta updates from OM

* HDDS-6215. Fix unit test

* trigger new CI check

* HDDS-6215. Fix typo

* trigger new CI check

Co-authored-by: Symious <yiyang0203@gmail.com>

* HDDS-6226. Run tests for selective CI checks in CI (#3019)

* HDDS-6247. Avoid logging stack trace for user input problems (#3039)

* HDDS-6208. New checkstyle: WhitespaceAround (#3003)

* HDDS-6289. Upgrade acceptance test log flooded with parse error (#3063)

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>

* Trigger Build

* Fix integration test for added configuation field for selecting OmTransport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml).

Co-authored-by: Doroszlai, Attila <6454655+adoroszlai@users.noreply.github.com>
Co-authored-by: Lokesh Jain <ljain@apache.org>
Co-authored-by: Aswin Shakil Balasubramanian <aswinshakilbalu@gmail.com>
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Co-authored-by: Symious <yiyang0203@foxmail.com>
Co-authored-by: Bharat Viswanadham <bharat@apache.org>
Co-authored-by: Stephen O'Donnell <stephen.odonnell@gmail.com>
Co-authored-by: GeorgeJahad <github@blackbirdsystems.net>
Co-authored-by: Siddhant Sangwan <siddhantsangwan027@gmail.com>
Co-authored-by: Jyotinder Singh <jyotindrsingh@gmail.com>
Co-authored-by: Shashikant Banerjee <shashikant@apache.org>
Co-authored-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>
Co-authored-by: Kiyoshi Mizumaru <kiyoshi.mizumaru@gmail.com>
Co-authored-by: Ritesh H Shukla <kerneltime@gmail.com>
Co-authored-by: Nibiru <axcsd3692@qq.com>
Co-authored-by: Kaijie Chen <chen@kaijie.org>
Co-authored-by: Zita Dombi <50611074+dombizita@users.noreply.github.com>
Co-authored-by: Istvan Fajth <pifta@cloudera.com>
Co-authored-by: steinsgateted <71066027+steinsgateted@users.noreply.github.com>
Co-authored-by: Gui Hecheng <markgui@tencent.com>
Co-authored-by: Jackson Yao <jacksonyao@tencent.com>
Co-authored-by: Keyi Song <72794035+sky76093016@users.noreply.github.com>
Co-authored-by: UENISHI Kota <kuenishi@users.noreply.github.com>
Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
Co-authored-by: Symious <yiyang0203@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants