Skip to content

Comments

Fixes for remote store integration#20325

Merged
Bukhtawar merged 1 commit intoopensearch-project:feature/datafusionfrom
mgodwan:remote-store-validation
Dec 29, 2025
Merged

Fixes for remote store integration#20325
Bukhtawar merged 1 commit intoopensearch-project:feature/datafusionfrom
mgodwan:remote-store-validation

Conversation

@mgodwan
Copy link
Member

@mgodwan mgodwan commented Dec 26, 2025

Fixes in upload flow

  1. Close acquired snapshot
  2. Clone should not acquire
  3. Translog interaction should be on indexer, rather than engine.
  4. Seq No management for remote store verification

Pending: Tests

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 26, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

❌ Gradle check result for 3294fec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


final class LastRefreshedCheckpointListener implements ReferenceManager.RefreshListener {
final AtomicLong refreshedCheckpoint;
volatile AtomicLong pendingCheckpoint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AtomicLong is implicitly volatile in terms of memory visibility

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.


import java.util.concurrent.atomic.AtomicLong;

final class LastRefreshedCheckpointListener implements ReferenceManager.RefreshListener {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this as an inner class for CompositeEngine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this outside so that is can shared between InternalEngine and CompositeEngine. Made changes in InternalEngine as well.

Comment on lines +242 to 245
public CatalogSnapshot cloneNoAcquire() {
// Still using the clone call since Lucene call requires clone. This will allow a SegmentsInfos backed CatalogSnapshot to use the same method in calls.
return this;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this isn't clone right?

Copy link
Member Author

@mgodwan mgodwan Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
We need a clone for SegmentsInfo but it is not required for default CatalogSnaphot. Once we have a SegmentInfos backed CatalogSnapshot for Lucene changes to work with same code, this would make more sense.
Hence, kept it like this for now.

Comment on lines 69 to 73
try {
catalogSnapshot.close();
} catch (Exception ex) {

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is the older version of the snapshot that would be dec referenced?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the latest one, but the RefreshListener relies on fetching the latest catalog snapshot from engine later. I've modified this to be passed a supplier so that it can acquire and release both in the same context.

@mgodwan mgodwan force-pushed the remote-store-validation branch from 3294fec to 0c32359 Compare December 29, 2025 07:08
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
@mgodwan mgodwan force-pushed the remote-store-validation branch from 0c32359 to 1854fe5 Compare December 29, 2025 07:11
@github-actions
Copy link
Contributor

❌ Gradle check result for 1854fe5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Bukhtawar Bukhtawar merged commit 283b9f1 into opensearch-project:feature/datafusion Dec 29, 2025
8 of 31 checks passed
bharath-techie pushed a commit to bharath-techie/OpenSearch that referenced this pull request Dec 30, 2025
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
raghuvanshraj pushed a commit to raghuvanshraj/OpenSearch-Fork that referenced this pull request Jan 13, 2026
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
raghuvanshraj pushed a commit to raghuvanshraj/OpenSearch-Fork that referenced this pull request Jan 19, 2026
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
bharath-techie pushed a commit that referenced this pull request Jan 20, 2026
* Changes to get remote upload and replication to work for lucene indices

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Replication changes

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixes for remote store integration (#20325)

Signed-off-by: Mohit Godwani <mgodwan@amazon.com>

* Updated and Refactored the code to have only required changes for recovery

* Removed extra logging lines

* Resolving PR comments

* Updated the VSRManagerTests to use ParquetFileMetadata as flushResult

* Fixing RemoteSegmentStoreDirectoryTests and RemoteSegmentStoreDirectoryWithPinnedTimestampTests

* Fix for merged files not getting uploaded/deleted

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Added support for SSE KMS testing via run.gradle

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* [RemoteStore] Add support for repository with server side encryption enabled to avoid client side encryption  (#19630)

Signed-off-by: Pranit Kumar <pranikum@amazon.com>

* Adding support for SSE KMS in RemoteSegmentStoreDirectoryFactory

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixing lucene remote recovery and files should not get written in replicas

* Removing extra logging statements

* Revert "Fixing lucene remote recovery and files should not get written in replicas"

This reverts commit 1f3aa80.

* Using NIOFSDirectory in GenericStoreDirectory

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Adding NRTReplicationCompositeEngine for checkpoint tracking during recovery

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Using index input for checksum calculation instead of input stream

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixing lucene remote recovery and files should not get written in replicas

* Fixed RemoteIndexRecoveryIT::testRerouteRecovery

* Removed the isReadonly check from compositeEngine as we have the NRTEngine now

* Removing override for getHistoryUUID in NRTReplicationCompositeEngine

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Updated the IndexFileDeleterTests to use absolute file path

* Refactoring syncSegmentsFromGivenRemoteSegmentStore and syncSegmentsFromRemoteSegmentStore APIs

* Made deleteUnrefrencedFiles api more generic

* Modifying UUID to be added before file name extension for catalog support on parquet files

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

---------

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
Signed-off-by: Pranit Kumar <pranikum@amazon.com>
Co-authored-by: Mohit Godwani <81609427+mgodwan@users.noreply.github.com>
Co-authored-by: Kamal Nayan <askkamal@amazon.com>
Co-authored-by: pranikum <kpranit81@gmail.com>
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jan 26, 2026
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jan 26, 2026
* Changes to get remote upload and replication to work for lucene indices

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Replication changes

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixes for remote store integration (opensearch-project#20325)

Signed-off-by: Mohit Godwani <mgodwan@amazon.com>

* Updated and Refactored the code to have only required changes for recovery

* Removed extra logging lines

* Resolving PR comments

* Updated the VSRManagerTests to use ParquetFileMetadata as flushResult

* Fixing RemoteSegmentStoreDirectoryTests and RemoteSegmentStoreDirectoryWithPinnedTimestampTests

* Fix for merged files not getting uploaded/deleted

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Added support for SSE KMS testing via run.gradle

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* [RemoteStore] Add support for repository with server side encryption enabled to avoid client side encryption  (opensearch-project#19630)

Signed-off-by: Pranit Kumar <pranikum@amazon.com>

* Adding support for SSE KMS in RemoteSegmentStoreDirectoryFactory

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixing lucene remote recovery and files should not get written in replicas

* Removing extra logging statements

* Revert "Fixing lucene remote recovery and files should not get written in replicas"

This reverts commit 1f3aa80.

* Using NIOFSDirectory in GenericStoreDirectory

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Adding NRTReplicationCompositeEngine for checkpoint tracking during recovery

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Using index input for checksum calculation instead of input stream

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Fixing lucene remote recovery and files should not get written in replicas

* Fixed RemoteIndexRecoveryIT::testRerouteRecovery

* Removed the isReadonly check from compositeEngine as we have the NRTEngine now

* Removing override for getHistoryUUID in NRTReplicationCompositeEngine

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

* Updated the IndexFileDeleterTests to use absolute file path

* Refactoring syncSegmentsFromGivenRemoteSegmentStore and syncSegmentsFromRemoteSegmentStore APIs

* Made deleteUnrefrencedFiles api more generic

* Modifying UUID to be added before file name extension for catalog support on parquet files

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>

---------

Signed-off-by: Raghuvansh Raj <raghraaj@amazon.com>
Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
Signed-off-by: Pranit Kumar <pranikum@amazon.com>
Co-authored-by: Mohit Godwani <81609427+mgodwan@users.noreply.github.com>
Co-authored-by: Kamal Nayan <askkamal@amazon.com>
Co-authored-by: pranikum <kpranit81@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants