Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][ml] Topic load timeout due to ml data ledger future never finishes #23772

Merged
merged 3 commits into from
Dec 25, 2024

Conversation

poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Dec 23, 2024

Motivation

Background
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:

  • Start a scheduled task to check whether the creation will be timeout.
  • Received a callback
    • Check whether the future(@param ctx of BK.createAsync) has been done or not.
    • If done: it means the creation has timeout before the creation is completed
    • Otherwise: it is a real callback from BK.

Issue:
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test testCreateDataLedgerTimeout:

  • Start creating a data ledger
    • Call BK.createAsync
  • Timeout
    • Mark the future(@param ctx of BK.createAsync) as completed exceptionally.
    • Trigger the callback related to ledger creation.
      • Check whether the future(@param ctx of BK.createAsync) has been done or not.
      • If done: do nothing.
  • Creation is compelled.
    • Trigger the callback related to ledger creation.
      • Check whether the future(@param ctx of BK.createAsync) has been done or not.
      • If done: do nothing.
  • Issue: The callback for ledger creation will never be called.

Screenshot 2024-12-24 at 00 14 38
Screenshot 2024-12-24 at 00 14 08

Modifications

Fix the issue

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 4.1.0 milestone Dec 23, 2024
@poorbarcode poorbarcode self-assigned this Dec 23, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 23, 2024
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.19%. Comparing base (bbc6224) to head (37d066c).
Report is 813 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23772      +/-   ##
============================================
+ Coverage     73.57%   74.19%   +0.62%     
+ Complexity    32624    32159     -465     
============================================
  Files          1877     1853      -24     
  Lines        139502   143373    +3871     
  Branches      15299    16276     +977     
============================================
+ Hits         102638   106382    +3744     
+ Misses        28908    28624     -284     
- Partials       7956     8367     +411     
Flag Coverage Δ
inttests 26.71% <50.00%> (+2.13%) ⬆️
systests 23.10% <0.00%> (-1.22%) ⬇️
unittests 73.73% <100.00%> (+0.88%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.85% <100.00%> (+1.19%) ⬆️

... and 1021 files with indirect coverage changes

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks you should fix the checkAndCompleteLedgerOpTask():

    protected boolean checkAndCompleteLedgerOpTask(int rc, LedgerHandle lh, Object ctx) {
        if (ctx instanceof CompletableFuture) {
            // ledger-creation is already timed out and callback is already completed so, delete this ledger and return.
            if (((CompletableFuture) ctx).complete(lh)) {
                return false;
            } else {
                if (rc == BKException.Code.OK) {
                    log.warn("[{}]-{} ledger creation timed-out, deleting ledger", this.name, lh.getId());
                    asyncDeleteLedger(lh.getId(), DEFAULT_LEDGER_DELETE_RETRIES);
                    return true; // changed
                }
                return false; // changed
            }
        }
        return false;
    }

@shibd
Copy link
Member

shibd commented Dec 24, 2024

It looks you should fix the checkAndCompleteLedgerOpTask():

+1

@poorbarcode
Copy link
Contributor Author

@shibd @nodece

It looks you should fix the checkAndCompleteLedgerOpTask():

Good suggestion. Modified.

@poorbarcode poorbarcode requested review from nodece and shibd December 25, 2024 02:05
@dao-jun
Copy link
Member

dao-jun commented Dec 25, 2024

It looks you should fix the checkAndCompleteLedgerOpTask():

    protected boolean checkAndCompleteLedgerOpTask(int rc, LedgerHandle lh, Object ctx) {
        if (ctx instanceof CompletableFuture) {
            // ledger-creation is already timed out and callback is already completed so, delete this ledger and return.
            if (((CompletableFuture) ctx).complete(lh)) {
                return false;
            } else {
                if (rc == BKException.Code.OK) {
                    log.warn("[{}]-{} ledger creation timed-out, deleting ledger", this.name, lh.getId());
                    asyncDeleteLedger(lh.getId(), DEFAULT_LEDGER_DELETE_RETRIES);
                    return true; // changed
                }
                return false; // changed
            }
        }
        return false;
    }

It seems no need to change here, the following steps will handle the case of rc != OK

@nodece nodece changed the title [fix] [ml] Topic load timeout due to ml data ledger future never finishes [fix][ml] Topic load timeout due to ml data ledger future never finishes Dec 25, 2024
@congbobo184 congbobo184 merged commit 9699dc2 into apache:master Dec 25, 2024
54 checks passed
poorbarcode added a commit that referenced this pull request Dec 25, 2024
…hes (#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
poorbarcode added a commit that referenced this pull request Dec 25, 2024
…hes (#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
poorbarcode added a commit that referenced this pull request Dec 25, 2024
…hes (#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 26, 2024
…hes (apache#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
(cherry picked from commit 2189b60)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 26, 2024
…hes (apache#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
(cherry picked from commit 2189b60)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 26, 2024
…hes (apache#23772)

### Motivation

**Background**
There is a mechanism that repeatedly prevents the callback of ML data ledger creation:
- Start a scheduled task to check whether the creation will be timeout.
- Received a callback
  - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
  - If done: it means the creation has timeout before the creation is completed
  - Otherwise: it is a real callback from BK.

**Issue:**
But the timeout event will call the same callback as above, then the steps are as follows, which you ca reproduce by the test `testCreateDataLedgerTimeout`:
- Start creating a data ledger
  - Call `BK.createAsync`
- Timeout
  - Mark the future(`@param ctx` of `BK.createAsync`) as completed exceptionally.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Creation is compelled.
  - Trigger the callback related to ledger creation.
    - Check whether the future(`@param ctx` of `BK.createAsync`) has been done or not.
    - If done: do nothing.
- Issue: The callback for ledger creation will never be called.

![Screenshot 2024-12-24 at 00 14 38](https://github.com/user-attachments/assets/44ed19d2-7238-45a4-9186-c127f6ed14f7)
![Screenshot 2024-12-24 at 00 14 08](https://github.com/user-attachments/assets/349f39ff-7e98-4a09-9af2-f80082339592)

### Modifications

Fix the issue

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: x

(cherry picked from commit 9699dc2)
(cherry picked from commit 2189b60)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants