Skip to content

Configurable hash size for segment file metadata#19499

Closed
xuxiong1 wants to merge 2 commits intoopensearch-project:mainfrom
xuxiong1:hashsize
Closed

Configurable hash size for segment file metadata#19499
xuxiong1 wants to merge 2 commits intoopensearch-project:mainfrom
xuxiong1:hashsize

Conversation

@xuxiong1
Copy link
Contributor

@xuxiong1 xuxiong1 commented Oct 1, 2025

Description

This change makes the metadata hash size configurable to prevent CorruptIndexException on large segment info files.

During recovery, for those small files, OS will do an additional safety check to compute a strong hash and limit the read bytes size to 1MB, which could cause the checksum to be 0 as it's only verifying the first 1MB data and would never reach the last 8 bytes checksum position.

The new index.store.metadata_hash.size setting allows users to configure appropriate hashsize limits (default: 1MB) for smaller files. Files exceeding the configured limit skip hash computation with warning logs

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

❌ Gradle check result for 891a35f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@msfroh
Copy link
Contributor

msfroh commented Oct 1, 2025

I'm not convinced that this is a good setting to add or expose from OpenSearch.

Under normal circumstances, a SegmentInfos file (or a .si file) should never be 1MB. You need to shove a lot of extra stuff in your SegmentInfos file in order to hit that limit.

I don't think this is a generally useful setting.

@xuxiong1
Copy link
Contributor Author

xuxiong1 commented Oct 2, 2025

I'm not convinced that this is a good setting to add or expose from OpenSearch.

Under normal circumstances, a SegmentInfos file (or a .si file) should never be 1MB. You need to shove a lot of extra stuff in your SegmentInfos file in order to hit that limit.

I don't think this is a generally useful setting.

Since the segments_N file serves as the metadata for the index, user could potentially add some customized info to the userData field, as long as the file is not excessively large, we should have the flexibility to allow user to do so? @yupeng9 @itschrispeck wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants