Skip to content

Conversation

@Tejaskriya
Copy link
Contributor

@Tejaskriya Tejaskriya commented Aug 4, 2025

What changes were proposed in this pull request?

This PR adds container scans in the following cases:

  • EC reconstruction: for success
  • Reconciler: for success and failure
  • Export container: for failure

And volume scan is triggered during the following cases:

  • Import container : for failure

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13238

How was this patch tested?

Added unit tests

@Tejaskriya Tejaskriya added the scanners Changes related to datanode container and volume scanners label Aug 4, 2025
@Tejaskriya Tejaskriya requested a review from errose28 August 5, 2025 08:12
@Tejaskriya Tejaskriya marked this pull request as draft August 5, 2025 10:03
@Tejaskriya Tejaskriya marked this pull request as ready for review August 5, 2025 12:50
@errose28 errose28 requested a review from aswinshakil August 7, 2025 17:34
@errose28
Copy link
Contributor

errose28 commented Aug 7, 2025

In this PR, let's also update this location which gets triggered when the container file cannot be updated. We should do an on-demand container scan first, and only a volume scan if that fails. However, we don't want to trigger the on-demand container scan in this exact spot in case this is being called to mark the container unhealthy, or it will get stuck in a loop. Instead we probably want to trigger it here.

Note that if a container scan fails due to permission errors, the result will only be kept in memory. Persistence will fail since we don't have permission to update the file, but that is desirable since permissions can be restored on restart. Let's add a comment explaining that in this scan location as well.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @Tejaskriya. Can you address the merge conflict? I think the reconciliation test will need to be updated after #8908 because the failure being injected is in peer communication, which no longer causes a complete failure of reconciliation. Instead you could introduce a local issue like removing the container's directory from the disk before trying to reconcile, which should cause early calls like this to fail.

@Tejaskriya
Copy link
Contributor Author

@errose28 I have made the changes as described in your review comments above, please take a look again, thanks!

@Tejaskriya Tejaskriya requested a review from errose28 August 18, 2025 07:34
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prod code LGTM, just minor comments on tests.

@Tejaskriya Tejaskriya requested a review from errose28 August 21, 2025 14:01
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @Tejaskriya.

@errose28 errose28 merged commit 3073a56 into apache:master Aug 21, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scanners Changes related to datanode container and volume scanners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants