Skip to content

Conversation

@ashishkumar50
Copy link
Contributor

What changes were proposed in this pull request?

A new property ozone.om.lease.hard.limit(default 7d) aded for hsync open files.
If file lifetime is already beyond the ozone.om.lease.hard.limit threshold, there are two cases, OpenKeyCleanupService will
1. If it has the leaseRecovery flag set, which means it is under an explicitly lease recovery, then OpenKeyCleanupService will keep it in OpenFileTable.
2. If it doesn’t has the leaseRecovery flag set, then OpenKeyCleanupService will auto commit it with the KeyInfo in openFileTable. In this case, the final block might lose some data at the block end which is hsynced.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10141

How was this patch tested?

New and exiting unit test.

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly makes sense. Just one comment:

Comment on lines +1892 to +1893
} else if (isHsync && openKeyInfo.getModificationTime() <= expiredLeaseTimestamp &&
!openKeyInfo.getMetadata().containsKey(OzoneConsts.LEASE_RECOVERY)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (isHsync && openKeyInfo.getModificationTime() <= expiredLeaseTimestamp &&
!openKeyInfo.getMetadata().containsKey(OzoneConsts.LEASE_RECOVERY)) {
} else if (isHsync && openKeyInfo.getModificationTime() <= expiredLeaseTimestamp) {

What if the client performing lease recovery crashes before committing the final change? The file would become recovery-in-progress forever and can't be closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When manual recovery is performed on a file we are skipping those file for auto recovery. These file can be manually recovered any time. We are skipping this to avoid any data loss in this case as the exact length may not get updated during auto recovery.
@ChenSammi can you please help to confirm if we can remove this check?

Copy link
Contributor

@ChenSammi ChenSammi Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If the manual recovery client crashes before the final file commit, then this file will be kept in the openKeyTable. User can rerun the manual recovery through CLI again to recover the file later.
The reason we let this file in openKeyTable is auto-recovery will inevitably lost some data in file's last block. So if user showed the intent to recover the file manually, we need to keep the chance for the user.

@smengcl smengcl changed the title HDDS-10141. [hsync]Support hard limit and auto recovery for hsync file. HDDS-10141. [hsync] Support hard limit and auto recovery for hsync file. Jan 23, 2024
<description>
Controls how long an open hsync key is considered as active. Specifically, if a hsync key
has been open longer than the value of this config entry, that open hsync key is considered as
expired (e.g. due to client crash). Unit could be defined with postfix (ns,ms,s,m,h,d)
Copy link
Contributor

@smengcl smengcl Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an admin misconfigures this to 7ms? A key will almost immediately get committed after a client calls hsync() if OpenKeyCleanupService is triggered at that moment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is, should we have a lower limit on this config?

Copy link
Contributor

@ChenSammi ChenSammi Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add one check, the hard limit should not be less than the soft limit.

For most of duration properties in Ozone, if some of them are related, then we will check whether one is smaller than another. Talking about misconfiguration, it's hard to draw a bar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi I agree that it is generally hard to prevent all misconfigurations.

I was prompted when I saw these ns,ms units in the description. :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for hard limit should not be less than the soft limit, and made it equal if it is so.

@jojochuang jojochuang added the hbase HBase on Ozone support label Jan 23, 2024
Copy link
Contributor

@ChenSammi ChenSammi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last patch looks good to me.

@jojochuang , @smengcl , would you like to another look?

@ChenSammi ChenSammi merged commit 2e2d08e into apache:HDDS-7593 Jan 25, 2024
chungen0126 pushed a commit to chungen0126/ozone that referenced this pull request May 3, 2024
chungen0126 pushed a commit to chungen0126/ozone that referenced this pull request May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hbase HBase on Ozone support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants