Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix failure while creating non-durable cursor with inactive managed-ledger #21508

Merged
merged 1 commit into from
Nov 4, 2023

Conversation

rdhabalia
Copy link
Contributor

@rdhabalia rdhabalia commented Nov 2, 2023

Motivation

Right now, the broker is failing to create a reader for a topic that is inactive and has deleted the inactive ledger as it has reached the inactive-ledger-timeout threshold. inactive-ledger-deletion task deletes inactive ledger but it doesn't create a new ledger because of that such topic stays with LedgerClosed state and without active ledger, and meanwhile if any reader tries to connect on that topic then reader creation fails with below error

09:55:28.217 [pulsar-io-4-26] WARN  org.apache.pulsar.broker.service.ServerCnx - [/67.195.180.205:57218][persistent://tenant1/us-east1/ns1/t1][rea
der-8a3095b8c5] Failed to create consumer: consumerId=15, null
java.util.concurrent.CompletionException: java.util.NoSuchElementException
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:1113) ~[?:?]
        at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2235) ~[?:?]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.internalSubscribe(PersistentTopic.java:717) ~[pulsar-broker.jar]
        at org.apache.pulsar.broker.service.persistent.PersistentTopic.subscribe(PersistentTopic.java:693) ~[pulsar-broker.jar]
        at org.apache.pulsar.broker.service.ServerCnx.lambda$handleSubscribe$12(ServerCnx.java:1095) ~[pulsar-broker.jar]
:
java.util.NoSuchElementException
	at java.base/java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:1859)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.getFirstPosition(ManagedLedgerImpl.java:3715)
	at org.apache.bookkeeper.mledger.impl.NonDurableCursorImpl.<init>(NonDurableCursorImpl.java:59)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.newNonDurableCursor(ManagedLedgerImpl.java:1127)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.newNonDurableCursor(ManagedLedgerImpl.java:1108)
	at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl.newNonDurableCursor(ManagedLedgerImpl.java:1100)


Stats-internal
{
  "entriesAddedCounter" : 0,
  "numberOfEntries" : 0,
  "totalSize" : 0,
  "currentLedgerEntries" : 0,
  "currentLedgerSize" : 0,
  "lastLedgerCreatedTimestamp" : "2023-10-26T15:24:57.886Z",
  "waitingCursorsCount" : 26,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "1000:-1",
  "state" : "ClosedLedger",
  "ledgers" : [ ],

Modifications

Make sure, inactive-ledger-timeout task creates a new ledger after deleting inactive ledger to avoid above such failures while creating reader and subscription cursor.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@rdhabalia rdhabalia added area/broker doc-not-needed Your PR changes do not impact docs ready-to-test labels Nov 2, 2023
@rdhabalia rdhabalia added this to the 3.2.0 milestone Nov 2, 2023
@rdhabalia rdhabalia self-assigned this Nov 2, 2023
@github-actions github-actions bot added doc-label-missing and removed doc-not-needed Your PR changes do not impact docs labels Nov 2, 2023
Copy link

github-actions bot commented Nov 2, 2023

@rdhabalia Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Nov 2, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #21508 (1744b2d) into master (78fc853) will increase coverage by 39.86%.
Report is 1 commits behind head on master.
The diff coverage is 100.00%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21508       +/-   ##
=============================================
+ Coverage     33.39%   73.26%   +39.86%     
- Complexity    12194    32625    +20431     
=============================================
  Files          1638     1890      +252     
  Lines        127949   140442    +12493     
  Branches      13952    15436     +1484     
=============================================
+ Hits          42733   102897    +60164     
+ Misses        79564    29459    -50105     
- Partials       5652     8086     +2434     
Flag Coverage Δ
inttests 24.11% <33.33%> (-0.05%) ⬇️
systests 24.76% <33.33%> (?)
unittests 72.56% <100.00%> (+40.74%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...a/org/apache/bookkeeper/mledger/ManagedLedger.java 0.00% <ø> (ø)
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 80.90% <100.00%> (+41.65%) ⬆️

... and 1539 files with indirect coverage changes

Copy link
Contributor

@vineeth1995 vineeth1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@rdhabalia rdhabalia merged commit 720d6d2 into apache:master Nov 4, 2023
46 of 47 checks passed
@rdhabalia rdhabalia deleted the read_nosuch branch November 4, 2023 16:40
Technoboy- pushed a commit that referenced this pull request Nov 11, 2023
Technoboy- pushed a commit that referenced this pull request Nov 11, 2023
nborisov pushed a commit to nborisov/pulsar that referenced this pull request Nov 13, 2023
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
@@ -4476,10 +4476,13 @@ public void checkInactiveLedgerAndRollOver() {
}

ledgerClosed(lh);
createLedgerAfterClosed();
Copy link
Contributor

@poorbarcode poorbarcode Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdhabalia @vineeth1995 @lhotari

I removed this line (createLedgerAfterClosed();), but the test testNonDurableCursorCreateForInactiveLedger still passes.

I think the root cause of the issue you encountered is the ManagedLedegrImpl.ledgers is empty due to other bugs. So, I want to revert this PR.

Screenshot 2024-02-22 at 11 24 32

@codelipenghui
Copy link
Contributor

And it's a break change to the ManagedLedger.java since we already have @InterfaceStability.Stable

  /**
   * Can evolve while retaining compatibility for minor release boundaries.;
   * can break compatibility only at major release (ie. at m.0).
   */
  @Documented
  public @interface Stable {}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants