Add support for forward translog reading#20163
Add support for forward translog reading#20163msfroh merged 11 commits intoopensearch-project:mainfrom
Conversation
WalkthroughAdds an index-level boolean to control translog read direction, threads it into Translog and MultiSnapshot to allow forward or backward snapshot iteration, changes translog generation handling before resync trimming, adjusts initial sync boundary computation, and adds tests exercising forward-read behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Index Config
participant Settings as IndexSettings
participant Translog as Translog
participant Multi as MultiSnapshot
participant Files as Translog Files
Config->>Settings: load index settings
Settings->>Settings: read INDEX_TRANSLOG_READ_FORWARD_SETTING
Translog->>Settings: isTranslogReadForward()
Translog->>Multi: newMultiSnapshot(snapshots, onClose, readForward)
alt readForward == true
note right of Multi: iterator yields 0..N-1 (forward)
else
note right of Multi: iterator yields N-1..0 (backward)
end
loop replay snapshots
Multi->>Files: read snapshot at iterator index
Files-->>Multi: operations (per-file order preserved)
Multi->>Multi: deduplicate by seqNo and yield operations
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
🔇 Additional comments (1)
Comment |
|
❌ Gradle check result for c968213: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
c968213 to
234b0c9
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
638-704: Consider extracting common test logic.The test duplicates 66 lines from
testSeqNoCollision()(lines 571-636), differing only in the setting on line 646. While explicit duplication in tests aids clarity, you might consider a parameterized test or helper method to reduce maintenance overhead if the test logic evolves.Example approach using a helper method:
public void testSeqNoCollision() throws Exception { testSeqNoCollisionWithSettings(Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1") .build()); } public void testSeqNoCollisionWithReadForward() throws Exception { testSeqNoCollisionWithSettings(Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true) .build()); } private void testSeqNoCollisionWithSettings(Settings settings) throws Exception { // Common test logic here }server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)
3876-3934: Forward-read snapshot test wiring looks correctThe test correctly:
- Constructs a translog with
INDEX_TRANSLOG_READ_FORWARD_SETTINGenabled viagetTranslogConfig(tempDir, settings).- Uses a separate
LocalTransloginstance so it doesn’t interfere with the class-leveltranslog.- Populates
viewsin generation order and asserts thatnewSnapshot()yields the concatenated operations in forward (oldest→newest) order.This aligns with the new forward-reading
MultiSnapshotsemantics and gives good coverage for multi-generation snapshots.As a minor optional clean-up, you could factor the common setup logic between this test and
testSnapshotReadOperationInReverseinto a small helper to reduce duplication, but it’s not strictly necessary here.server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)
533-572: Read-forward recovery test mirrors baseline behavior appropriatelyThis test cleanly mirrors
testRecoveryTrimsLocalTranslogwhile:
- Enabling
INDEX_TRANSLOG_READ_FORWARD_SETTINGon the index.- Creating the replication group with an
InternalEngineFactory, keeping the execution model aligned with the baseline test.- Reusing the same flow (in-flight docs, replica promotion/demotion, recovery, and consistency assertions), which is exactly what we need to validate trimming semantics under forward translog reading.
The coverage looks solid and should catch regressions specific to read-forward mode.
If you want to reduce maintenance overhead later, you could extract the common body of this test and
testRecoveryTrimsLocalTransloginto a shared helper that takes theSettings(or aboolean readForward) as a parameter, but this duplication is acceptable as-is.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(2 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/index/translog/Translog.java (1)
764-765: LGTM!The implementation correctly reads the new
isTranslogReadForward()setting from IndexSettings and passes it to the MultiSnapshot constructor, enabling bidirectional translog reading based on index configuration.server/src/main/java/org/opensearch/index/IndexSettings.java (1)
914-914: Translog read‑forward flag wiring looks correct; confirm non‑dynamic lifecycle is intentionalThe new
translogReadForwardflag is:
- Backed by
INDEX_TRANSLOG_READ_FORWARD_SETTINGwith defaultfalse.- Read once at construction time from
settings.- Stored in a
finalfield and exposed viaisTranslogReadForward().This means the read direction is effectively fixed for the lifetime of the
IndexSettingsinstance and cannot be changed via a dynamic settings update (the setting also lacksProperty.Dynamicand there is no update consumer).If the intent is “configure at index creation (or close/open) only”, this is perfectly fine and keeps behavior simple. If you expect operators to toggle read‑forward on an already‑open index, you’d need to:
- Mark the setting as
Property.Dynamic.- Store it in a
volatilefield instead offinal.- Register an update consumer on
scopedSettingssimilar to other translog settings.Given the sensitivity of recovery semantics, the non‑dynamic approach is probably safer, but it’s worth double‑checking that this matches your operational expectations.
Also applies to: 1135-1135, 2102-2107
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (2)
57-70: Constructor wiring forreadForwardflag and index initialization looks soundThe added
readForwardflag and constructor wiring are consistent:
readForwardisfinal, so direction is immutable per snapshot.index = readForward ? 0 : translogs.length - 1;correctly handles both directions and the empty‑array case (0vs-1with loops guarding against out‑of‑bounds).No correctness issues here.
84-111: Forward vs backward traversal shares correct dedupe semantics; relies on trim preconditionThe new
next()implementation cleanly bifurcates behavior:
- Forward path (
readForward == true): iteratesindexfrom0totranslogs.length - 1, consuming eachTranslogSnapshotin order and usingseenSeqNo.getAndSet+overriddenOperationsin the same way as before.- Backward path keeps the original behavior (from
translogs.length - 1down to0) with identical per‑operation logic.A few points worth noting:
- The reuse of the same inner loop and
SeqNoSetlogic in both branches preserves the existing semantics of “first occurrence wins with respect to the chosen traversal order” and keepsoverriddenOperationsaccounting correct.- With backward reading, “first occurrence” means “latest generation wins”, which avoids stale operations from older primary terms.
- With forward reading, “first occurrence” means “oldest generation wins”. This matches the risk described in the new index setting Javadoc: if trimming of stale operations (
trimOperationOfPreviousPrimaryTerms(...)) hasn’t happened yet, forward traversal can surface outdated ops from older primary terms.Given that:
- As long as forward reading is only enabled in flows where stale‑term trimming is guaranteed to have run before constructing this
MultiSnapshot, this implementation is consistent with the documented behavior.- If you want extra safety, you could consider adding assertions or tighter invariants at the call site (e.g., around when forward snapshots are created) to ensure we don’t accidentally use forward traversal in a pre‑trim state, but that’s optional and outside this class.
Overall, the bidirectional iteration logic here is correct and symmetric.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/IndexSettings.java (3)
915-915: LGTM! Field declaration is correct.The field is appropriately declared as
finalsince the setting is not dynamic.
1136-1136: LGTM! Field initialization is correct.The initialization follows the standard pattern for non-dynamic IndexScope settings.
2103-2108: LGTM! Getter method is correctly implemented.The method follows standard naming conventions and provides appropriate access to the setting value.
|
❌ Gradle check result for fe0811d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java
Outdated
Show resolved
Hide resolved
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Signed-off-by: xuxiong1 <xiongxug@outlook.com>
7ca112a to
79c1e0a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
CHANGELOG.md (1)
41-42: Remove duplicated/malformed changelog entry for forward translog readingYou currently have two entries for the same feature, and the first is still missing the closing parenthesis. Keep a single, well-formed line.
- - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163) - - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163)) + - Add support for forward translog reading ([#20163](https://github.com/opensearch-project/OpenSearch/pull/20163))
🧹 Nitpick comments (1)
server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java (1)
638-704: Consider refactoring to reduce duplication.The
testSeqNoCollisionWithReadForward()test is nearly identical totestSeqNoCollision()(lines 571-636), differing only in enabling theINDEX_TRANSLOG_READ_FORWARD_SETTING. While duplicating tests to cover feature flag variations is a common and acceptable pattern, you could reduce maintenance burden by extracting the shared logic into a parameterized test or helper method.Example refactor:
@ParameterizedTest @ValueSource(booleans = {false, true}) public void testSeqNoCollision(boolean readForward) throws Exception { Settings.Builder settingsBuilder = Settings.builder() .put(IndexSettings.INDEX_SOFT_DELETES_SETTING.getKey(), true) .put(IndexSettings.INDEX_TRANSLOG_RETENTION_AGE_SETTING.getKey(), "-1") .put(IndexSettings.INDEX_TRANSLOG_RETENTION_SIZE_SETTING.getKey(), "-1"); if (readForward) { settingsBuilder.put(IndexSettings.INDEX_TRANSLOG_READ_FORWARD_SETTING.getKey(), true); } try (ReplicationGroup shards = createGroup(2, settingsBuilder.build())) { // ... shared test logic ... } }That said, the current approach is perfectly acceptable and may be preferred if you want each scenario to be independently debuggable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(4 hunks)server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java(1 hunks)server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java(4 hunks)server/src/main/java/org/opensearch/index/translog/Translog.java(1 hunks)server/src/test/java/org/opensearch/index/replication/IndexLevelReplicationTests.java(1 hunks)server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java(1 hunks)server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java(1 hunks)server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- server/src/main/java/org/opensearch/index/translog/Translog.java
- server/src/test/java/org/opensearch/index/shard/PrimaryReplicaSyncerTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/action/resync/TransportResyncReplicationAction.java (1)
184-205: Rolling translog generation before trimming looks correct and matches trim invariantsCalling
replica.rollTranslogGeneration()beforetrimOperationOfPreviousPrimaryTermsensures no mixed-term operations remain in the current writer when trimming runs, which matches theLocalTranslogtrim invariants and avoids stale previous-primary-term ops when forward-reading the translog. The sequencing here (apply ops → roll → trim) looks sound.server/src/main/java/org/opensearch/index/translog/MultiSnapshot.java (1)
43-45: Iterator-based bidirectional traversal correctly preserves snapshot and dedup semanticsThe new
PrimitiveIterator+indexstate cleanly iterates snapshots in either 0..n−1 or n−1..0 order, fully exhausting eachTranslogSnapshotbefore advancing, while reusing the existingseenSeqNodedup logic. This fixes the earlier “one op per snapshot” issue and looks correct for both forward and backward reads.Also applies to: 57-76, 88-101
server/src/main/java/org/opensearch/index/shard/PrimaryReplicaSyncer.java (1)
319-359: UsingstartingSeqNo - 1astrimAboveSeqNocorrectly aligns trimming with the global checkpointDeriving
trimAboveSeqNofromstartingSeqNo - 1(i.e., the last known global checkpoint) and always sending the first request even when there are no ops ensures replicas trim previous-primary-term operations for all seqNos above the global checkpoint, fixing the earlier “trim from max seqNo only” gap without affecting later batches.server/src/test/java/org/opensearch/indices/recovery/RecoveryTests.java (1)
533-572: Forward-read recovery trimming test mirrors the existing reverse-read coverage
testRecoveryTrimsLocalTranslogWithReadForwardis a straightforward forward-read clone oftestRecoveryTrimsLocalTranslog, correctly wiringINDEX_TRANSLOG_READ_FORWARD_SETTINGand asserting Lucene/translog consistency and history equality after promotion/demotion cycles. Looks good.server/src/test/java/org/opensearch/index/translog/LocalTranslogTests.java (1)
3876-3934: Forward snapshot test correctly validates generation-order iteration
testSnapshotReadOperationForwardcreates a dedicated translog withINDEX_TRANSLOG_READ_FORWARD_SETTINGenabled, writes multiple generations while tracking per-generation ops, and then asserts thatnewSnapshot()yields the concatenated gen0→genN sequence. This is the right assertion for the forward-read path and cleanly exercises the new iteration logic.server/src/main/java/org/opensearch/index/IndexSettings.java (1)
199-218: Excellent documentation for the new setting.The comprehensive Javadoc clearly explains the behavior, default semantics, and edge case risks associated with forward reading. The choice to make this a non-dynamic index-scoped setting (cannot be changed after index creation) is appropriate given the complexity of the feature and the edge cases mentioned.
|
❕ Gradle check result for 78bc921: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
--------- Signed-off-by: xuxiong1 <xiongxug@outlook.com>
--------- Signed-off-by: xuxiong1 <xiongxug@outlook.com>
--------- Signed-off-by: xuxiong1 <xiongxug@outlook.com>
--------- Signed-off-by: xuxiong1 <xiongxug@outlook.com>
Description
This commit introduces a feature to read translog operations in forward order
(oldest to newest) instead of the default backward order (newest to oldest).
Changes:
index.translog.read_forwardsetting (default: false) inIndexSettingsTests:
Related Issues
Resolves #20094
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.