Skip to content

Conversation

@ArafatKhan2198
Copy link
Contributor

@ArafatKhan2198 ArafatKhan2198 commented Aug 12, 2025

What changes were proposed in this pull request?

Add upgrade action that triggers asynchronous NSSummary tree rebuild during Recon layout feature finalization. This ensures the NSSummary tree is rebuilt when upgrading to support materialized totals, without blocking Recon startup.

What this PR includes:

  • Upgrade action: NSSummaryAggregatedTotalsUpgrade implementing ReconUpgradeAction and annotated to run at FINALIZE.
  • Feature registration: Added NSSUMMARY_AGGREGATED_TOTALS entry to ReconLayoutFeature (feature version 3) and registration of the upgrade action via the annotation-based scanner.
  • Async rebuild trigger: ReconUtils.triggerAsyncNSSummaryRebuild(...) invoked by the upgrade action to schedule the rebuild without blocking startup.
  • Guard rails and hardening:
    • Prevent duplicate rebuilds using NSSummaryTask's unified rebuild state check.
    • Wait (up to 5 minutes, 1s poll) for OM tables to be initialized via omMetadataManager.isOmTablesInitialized() before running the rebuild, with graceful timeout and interruption handling.
    • Improved logging around scheduling, waiting, timeouts, and failures.
    • Hardening in NSSummaryTaskWithFSO.reprocessWithFSO(...) to skip processing when OM tables (e.g., directory table) are not yet initialized to avoid NPEs.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13571

How was this patch tested?

Tested manually

@ArafatKhan2198
Copy link
Contributor Author

@devmadhuu @sumitagrawl Please take a look

Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for the patch. Changes LGTM +1

@ArafatKhan2198
Copy link
Contributor Author

ArafatKhan2198 commented Aug 13, 2025

Tested with a cluster and also attached the Log with Explanation

As shown in the log below, the NSSummaryTask was triggered by the upgrade handler, but the tree was not built immediately because the required tables and NSSummary tasks had not yet been registered.

The registration and initialization steps must happen first. Only after those are complete can the NSSummaryTask run successfully.

You can see in the sequence below:

  1. Upgrade handler triggers the NSSummary tree rebuild.
  2. Tasks and tables are registered.
  3. NSSummaryTask executes successfully.

2025-08-13 14:07:07,023 [main] INFO upgrade.NSSummaryAggregatedTotalsUpgrade: Triggering asynchronous NSSummary tree rebuild for materialized totals (upgrade action).

2025-08-13 14:07:07,025 [main] INFO recon.ReconUtils: Async NSSummary tree rebuild scheduled successfully.

--- At this point, NSSummaryTask cannot run yet because tasks and tables are not registered ---

2025-08-13 14:08:11,253 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: OmDeltaRequest
2025-08-13 14:08:11,258 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: ContainerKeyMapperTaskFSO
2025-08-13 14:08:11,264 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: OmTableInsightTask
2025-08-13 14:08:11,268 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: NSSummaryTask
2025-08-13 14:08:11,272 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: ContainerKeyMapperTaskOBS
2025-08-13 14:08:11,275 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: FileSizeCountTaskOBS
2025-08-13 14:08:11,278 [Recon-SyncOM-0] INFO updater.ReconTaskStatusUpdater: Registered Task: FileSizeCountTaskFSO

--- Now that tasks and tables are registered, NSSummaryTask can execute ---

2025-08-13 14:08:11,443 [ReconTaskThread-0] INFO tasks.NSSummaryTask: Starting NSSummary tree reprocess with unified control...
2025-08-13 14:08:11,445 [Recon-NSSummaryTask-0] INFO tasks.NSSummaryTaskWithFSO: Completed a reprocess run of NSSummaryTaskWithFSO
2025-08-13 14:08:11,445 [ReconTaskThread-0] INFO tasks.NSSummaryTask:NSSummary reprocess execution time: 2 milliseconds
2025-08-13 14:08:11,445 [ReconTaskThread-0] INFO tasks.NSSummaryTask:NSSummary tree reprocess completed successfully with unified control.

cc: @devmadhuu @sumitagrawl

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArafatKhan2198 Given few comments

@ArafatKhan2198
Copy link
Contributor Author

Thanks for the comments @sumitagrawl can you please take a look again -

The upgrade action works properly now -

2025-08-18 02:37:16 2025-08-17 21:07:16,638 [main] INFO upgrade.ReconLayoutVersionManager: Current MLV: 2. SLV: 3. Checking features for registration...
2025-08-18 02:37:16 2025-08-17 21:07:16,641 [PipelineSyncTask] WARN scm.ReconPipelineManager: Pipeline Pipeline-ea9c8eb3-0348-43b8-b3a9-4f8ef8427380 already exists in Recon pipeline metadata.
2025-08-18 02:37:16 2025-08-17 21:07:16,641 [PipelineSyncTask] WARN scm.ReconPipelineManager: Pipeline Pipeline-5da9e7bf-e8ee-4508-a317-654af71783a1 already exists in Recon pipeline metadata.
2025-08-18 02:37:16 2025-08-17 21:07:16,663 [main] INFO recon.ReconSchemaVersionTableManager: Updated schema version to '3'.
2025-08-18 02:37:16 2025-08-17 21:07:16,663 [main] INFO upgrade.ReconLayoutVersionManager: MLV updated to: 3
2025-08-18 02:37:16 2025-08-17 21:07:16,663 [main] INFO upgrade.NSSummaryAggregatedTotalsUpgrade: Triggering asynchronous NSSummary tree rebuild for materialized totals (upgrade action).
2025-08-18 02:37:16 2025-08-17 21:07:16,664 [RebuildNSSummaryThread] INFO tasks.NSSummaryTask: Starting NSSummary tree reprocess with unified control...
2025-08-18 02:37:16 2025-08-17 21:07:16,665 [main] INFO upgrade.ReconLayoutVersionManager: Feature versioned 3 finalized successfully.

@ArafatKhan2198 ArafatKhan2198 marked this pull request as ready for review August 18, 2025 04:35
Copy link
Contributor

@devmadhuu devmadhuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ArafatKhan2198 for the patch. LGTM +1

Copy link
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArafatKhan2198 ArafatKhan2198 merged commit f191492 into apache:master Aug 18, 2025
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants