Skip to content

Conversation

@ivandika3
Copy link
Contributor

@ivandika3 ivandika3 commented Jan 17, 2024

What changes were proposed in this pull request?

Fix NPE on OMDBCheckpointServlet caused by possible null SstFilteringService. The null SstFilteringService is due to configuration ozone.filesystem.snapshot.enabled=false or ozone.snapshot.filtering.service.interval=-1.

When NPE is thrown in the call of OMDBCheckpointServlet.Lock, this cause the subsequent call to OMDBCheckpointServlet.Lock.lock to block indefinitely since the previous call that triggered the NPE did not release the lock held on keyDeletingService.

The current solution is a simple nullity check. There might be better solution to prevent the NPE than just a nullity check.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10138

How was this patch tested?

Manual test.

Clean CI: https://github.com/ivandika3/ozone/actions/runs/7550769000

@ivandika3 ivandika3 marked this pull request as ready for review January 17, 2024 06:16
@ivandika3
Copy link
Contributor Author

@GeorgeJahad @ChenSammi @symious Could you help to review this?

@ivandika3
Copy link
Contributor Author

We might also need to synchronize the lock and unlock to prevent possible deadlock.

@symious
Copy link
Contributor

symious commented Jan 17, 2024

If it's for null check, can we change to use Optional?

@ivandika3
Copy link
Contributor Author

@symious Updated. PTAL.

sstFilteringService.getBootstrapStateLock().lock();
rocksDbCheckpointDiffer.getBootstrapStateLock().lock();
snapshotDeletingService.getBootstrapStateLock().lock();
if (keyDeletingService.isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can changed to ifPresent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since lock throws InterruptedException (unlike unlock), it might not be possible to use ifPresent.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for the patch. I'd like to suggest some changes to simplify the logic.

Comment on lines 649 to 652
private final Optional<BootstrapStateHandler> keyDeletingService;
private final Optional<BootstrapStateHandler> sstFilteringService;
private final Optional<BootstrapStateHandler> rocksDbCheckpointDiffer;
private final Optional<BootstrapStateHandler> snapshotDeletingService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Optional<BootstrapStateHandler> keyDeletingService;
private final Optional<BootstrapStateHandler> sstFilteringService;
private final Optional<BootstrapStateHandler> rocksDbCheckpointDiffer;
private final Optional<BootstrapStateHandler> snapshotDeletingService;
private final List<BootstrapStateHandler.Lock> locks;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated.

Comment on lines 662 to 666
keyDeletingService = Optional.ofNullable(om.getKeyManager().getDeletingService());
sstFilteringService = Optional.ofNullable(om.getKeyManager().getSnapshotSstFilteringService());
rocksDbCheckpointDiffer = Optional.ofNullable(om.getMetadataManager().getStore()
.getRocksDBCheckpointDiffer());
snapshotDeletingService = Optional.ofNullable(om.getKeyManager().getSnapshotDeletingService());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keyDeletingService = Optional.ofNullable(om.getKeyManager().getDeletingService());
sstFilteringService = Optional.ofNullable(om.getKeyManager().getSnapshotSstFilteringService());
rocksDbCheckpointDiffer = Optional.ofNullable(om.getMetadataManager().getStore()
.getRocksDBCheckpointDiffer());
snapshotDeletingService = Optional.ofNullable(om.getKeyManager().getSnapshotDeletingService());
locks = Stream.of(
om.getKeyManager().getDeletingService(),
om.getKeyManager().getSnapshotSstFilteringService(),
om.getMetadataManager().getStore().getRocksDBCheckpointDiffer(),
om.getKeyManager().getSnapshotDeletingService()
)
.filter(Objects::nonNull)
.map(BootstrapStateHandler::getBootstrapStateLock)
.collect(Collectors.toList());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL.

Comment on lines 673 to 684
if (keyDeletingService.isPresent()) {
keyDeletingService.get().getBootstrapStateLock().lock();
}
if (sstFilteringService.isPresent()) {
sstFilteringService.get().getBootstrapStateLock().lock();
}
if (rocksDbCheckpointDiffer.isPresent()) {
rocksDbCheckpointDiffer.get().getBootstrapStateLock().lock();
}
if (snapshotDeletingService.isPresent()) {
snapshotDeletingService.get().getBootstrapStateLock().lock();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (keyDeletingService.isPresent()) {
keyDeletingService.get().getBootstrapStateLock().lock();
}
if (sstFilteringService.isPresent()) {
sstFilteringService.get().getBootstrapStateLock().lock();
}
if (rocksDbCheckpointDiffer.isPresent()) {
rocksDbCheckpointDiffer.get().getBootstrapStateLock().lock();
}
if (snapshotDeletingService.isPresent()) {
snapshotDeletingService.get().getBootstrapStateLock().lock();
}
for (BootstrapStateHandler.Lock lock : locks) {
lock.lock();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL.

Comment on lines 693 to 697
snapshotDeletingService.ifPresent(deletingService -> deletingService.getBootstrapStateLock().unlock());
rocksDbCheckpointDiffer.ifPresent(
rocksDBCheckpointDiffer -> rocksDBCheckpointDiffer.getBootstrapStateLock().unlock());
sstFilteringService.ifPresent(filteringService -> filteringService.getBootstrapStateLock().unlock());
keyDeletingService.ifPresent(deletingService -> deletingService.getBootstrapStateLock().unlock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
snapshotDeletingService.ifPresent(deletingService -> deletingService.getBootstrapStateLock().unlock());
rocksDbCheckpointDiffer.ifPresent(
rocksDBCheckpointDiffer -> rocksDBCheckpointDiffer.getBootstrapStateLock().unlock());
sstFilteringService.ifPresent(filteringService -> filteringService.getBootstrapStateLock().unlock());
keyDeletingService.ifPresent(deletingService -> deletingService.getBootstrapStateLock().unlock());
locks.forEach(BootstrapStateHandler.Lock::unlock);

Note: this will release locks in the same order as they were acquired, instead of in reverse order. I think both are fine as long as the order is fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PTAL.

Note: this will release locks in the same order as they were acquired, instead of in reverse order. I think both are fine as long as the order is fixed.

I think it should be fine.

However, seems like lock and unlock call are not synchronized properly on the Lock object. This might cause possible deadlocks. Shall we synchronize on the lock object (synhronized(this)) inside the lock and unlock call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the lock/unlock methods of OMDBCheckpointServlet.Lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking whether it's necessary to add synchronized block inside the lock and unlock. However, after I think about it again, it should be fine as it is.

import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If suggestion to use list is accepted, this becomes unused.

Suggested change
import java.util.Optional;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for updating the patch, LGTM.

Copy link
Contributor

@symious symious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@adoroszlai adoroszlai requested a review from hemantk-12 January 18, 2024 19:29
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivandika3 for the patch.

LGTM.

@adoroszlai adoroszlai merged commit abc3e1f into apache:master Jan 19, 2024
@adoroszlai
Copy link
Contributor

Thanks @ivandika3 for the patch, @hemantk-12, @symious for the review.

@ivandika3
Copy link
Contributor Author

Thank you for the reviews @adoroszlai @symious @hemantk-12

@ivandika3 ivandika3 deleted the HDDS-10138 branch January 25, 2024 04:24
adoroszlai pushed a commit to adoroszlai/ozone that referenced this pull request Jan 25, 2024
@ivandika3 ivandika3 self-assigned this Apr 23, 2024
ptlrs pushed a commit to ptlrs/ozone that referenced this pull request Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants