forked from apache/pulsar
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lh github actions workflow refactoring 2022 artifacts #60
Closed
lhotari
wants to merge
45
commits into
master
from
lh-github-actions-workflow-refactoring-2022-artifacts
Closed
lh github actions workflow refactoring 2022 artifacts #60
lhotari
wants to merge
45
commits into
master
from
lh-github-actions-workflow-refactoring-2022-artifacts
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…#14797) Fixes apache#509 ### Motivation When a C++ producer is created successfully, it will start a send timer. However the callback has captured the shared pointer of `ProducerImpl` itself. It extends the lifetime of `ProducerImpl` so that even after the `Producer` object destructs, the underlying `ProducerImpl` object won't be destructed. It could only be destructed after `Client::close()` is called. ### Modifications - Pass a weak pointer of `ProducerImpl` to the send timer and add a `asyncWaitSendTimeout` method for the combination of `expires_from_now` and `async_wait` calls on the timer. - Add `ClientTest.testReferenceCount` to verify the reference count will become 0 after the producer or consumer destructs.
Fixes apache#14665 ### Motivation In C++ client, a connect timeout task is created each time before an asynchronous connect operation is performed, if the connection cannot be established in the configured timeout, the callback of the task will be called to close the connection and then the `createProducer` or `subscribe` methods will return `ResultConnectError`. `ClientConnection::connectTimeoutTask_`, which is a shared pointer, represents the timeout task. However, after `ClientConnection::close` is called, the shared pointer will be reset, and the underlying `PeriodicTask` object will be released. After that, when `stop` method is called on the released `PeriodicTask` object in the callback (`handleTcpConnected`), a segmentation fault will happen. The root cause is that `connectTimeoutTask_` can be accessed in two threads while one of them could release the memory. See apache#14665 for more explanations. This race condition leads to flaky Python tests as well, because we also have the similar test in Python tests. See https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221 So this PR might also fix apache#14714. ### Modifications Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After that, the `connectTimeoutTask_` will always points to the same `PeriodicTask` object, whose methods are thread safe. ### Verifying this change Execute the following command ```bash ./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10 ``` to runs the `testConnectTimeout` for 10 times. In my local env, it never failed, while before applying this patch, it's very easy to fail.
* Only run crowdin-download once a day to avoid timeouts * Add license to website's package.json Master issue: apache#14824 ### Motivation The crowdin translations do not get updated often enough to warrant 3 downloads a day, especially given that the download and upload are currently timing out and preventing the english version of the website to get published. ### Modifications * Make the `crowdin-download` step only run once a day. * Add license to the yarn `package.json`. ### Verifying this change This is a trivial change.
* WIP: 2.8.3 release notes * Refactor titles Co-authored-by: Anonymitaet <[email protected]> * Fix tense; upper case first letters * Correct version; update date Co-authored-by: Anonymitaet <[email protected]>
Fix an incorrectly referenced directory in the website build. I introduced the bug here: apache#14788. See https://github.com/apache/pulsar/runs/5666378801?check_suite_focus=true for sample failure.
…ete. (apache#14818) ### Motivation If configuration ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests. ### Modifications *Describe the modifications you've done.*
* Java client: only schedule batch when there are pending messages * Update variable name; handle reconnect case * Remove code duplication * Fix method name * Improve method name; add tests; fix bugs raised by tests * Remove comment that is now incorrect * Chain batch flush task when task triggers delivery * Prevent early batch flush by storing lastBatchSendNanos * Refactor getPendingQueueSize * Remove unintentionally committed comments * Remove premature optimization to cancle and reschedule batchFlushTask * Ensure buffered batchMessageContainer messages can fail due to send timeout * Fix comment to match new design * Fix batch message counting log line * Guard against null batchMessageContainer * Fix send timeout logic to fire when batching is disabled Master Issue: apache#11100 ### Motivation As observed in apache#11100, the Java client producer consumes cpu even when doing nothing. This is especially true when using many producers. Instead, we can make it so that the batch timer is only scheduled when it has messages to deliver or just fired and delivered messages. If there are concerns about this optimization, I will need to take some time to complete benchmarks. ### Modifications * Skip message batch delivery if the producer is not in `Ready` state. As a consequence, ensure that messages pending in the `batchMessageContainer` are still eligible to fail for `sendTimeout`. * If there is no current batch flush task, schedule a single flush task when a message is added to a batch message container (assuming the message does not also trigger the delivery of the batch and the producer is in READY state). * Schedule another batch flush task if and only if the batch flush task itself was responsible for sending messages. * Keep track of `lastBatchSendNanoTime`, and only flush a batch message container if the `BatchingMaxPublishDelayMicros` time has passed since the last send time. * Note that the timer task is only ever updated within `synchronized (this)` block, so we are guaranteed to have a consistent view free of race conditions. (There is one race condition, and that is that an existing batch timer might get canceled while it is starting. This is of little consequence since it'll result in either no delivery or a small batch.) ### Verifying this change There are existing tests that cover batch message delivery. Specifically, the `BatchMessageTest` in the `pulsar-broker` module covers these changes. ### Does this pull request potentially affect one of the following parts: This is a backwards compatible change. ### Documentation - [x] `no-need-doc`
…pache#14810) ### Motivation When Transactionlog recover fail throw CursorAlreadyClosedException, we should stop the recover op. the cursor was been closed, the transaction log was been closed, so we should stop the recover op, in order to release thread resources like apache#14781 ### Modifications When recover fail by CursorAlreadyClosedException, comeplete recover
…ache#14801) ### Motivation When Transaction buffer recover create reader or create writer fail or read snapshot fail throw PulsarClientException, we should rerecover this topic so close this topic to reinit.
### Motivation apache#14823 fixes the flaky `testConnectTimeout` but it's also a regression of apache#14587. Because when the fd limit is reached, the `connectionTimeoutTask_` won't be initialized with a non-null value. Calling `stop` method on it directly will cause segmentation fault. See https://github.com/apache/pulsar/blob/0fe921f32cefe7648ca428cd9861f9163c69767d/pulsar-client-cpp/lib/ClientConnection.cc#L178-L185 ### Modifications Add the null check for `connectionTimeoutTask_` in `ClientConnection::close`.
…ientException (apache#14846) Co-authored-by: congbo <[email protected]>
apache#14835) Master Issue: apache#14824 ### Motivation In apache#14828, I tried to fix the python client docs. The correction partially succeeded. In looking at the `asf-site`, I noticed that the docs were not in the correct location and they were duplicated. I subsequently learned that docusaurus actually copies all content in the `static` directory to the `build/` output directory. Therefore, there is no need to copy the docs in the `build` script. The docs just have to be output in the correct location. ### Modifications * Move the docs created in apache#14788 from `python-client` to `api/python` so that they will properly render on the website. I already made this change on the `asf-site` branch here to ensure that the current website renders correctly: 53271f1. * Update the python client doc creation script. ### Verifying this change I ran the appropriate scrips to ensure that the site is built and rendered as expected. You can view the current state of the 2.9.1 docs here: https://pulsar.apache.org/api/python/2.9.1/pulsar.html.
…pache#14830) ### Motivation now transaction buffer recover no snapshot, we don't close the reader, it will produce the problem of OOM
…apache#14807) ### Motivation When Transaction buffer recover fail throw CursorAlreadyClosedException, we should stop the recover op. the cursor was been closed, the transaction buffer was been closed, so we should stop the recover op, in order to release thread resources like apache#14781
… in PaddingDemo.proto (apache#14862) Co-authored-by: tmytangmingyang <[email protected]>
…shTest passes on JDK 8 (apache#14860)
* [Broker] Fix NPE when subscription is already removed * Cover same case for NonPersistentTopic Master Issue: apache#14362 ### Motivation There is current a race condition when we remove a subscription. The race and how to reproduce it is described in the apache#14362. One of the consequences of the race is that there is a chance we try to remove the subscription from the topic twice. This leads to an NPE, as described in the issue. ### Modifications * Verify that the `sub` is not null before getting its stats. ### Verifying this change This is a trivial change.
### Motivation apache#14896 is flaky, after diving into the codes, I find it's a bug about closing topic policy reader. We should use `ex.getCause` instead of `ex`. Stacktrace: ``` 2022-03-27T10:42:16,795+0800 [broker-client-shared-internal-executor-58-1] WARN org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService - Read more topic polices exception, read again. java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$AlreadyClosedException: Consumer already closed at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996) ~[?:1.8.0_291] at org.apache.pulsar.client.impl.MultiTopicsReaderImpl.readNextAsync(MultiTopicsReaderImpl.java:140) ~[classes/:?] ```
### Motivation The current non-durable cursor does not have the correct state. For example, when the reader is created, I always see the cursor status as ``Uninitialized`` via the ``getInternalStats`` method. ```json { "reader-xxxxx" : { "markDeletePosition" : "19785:18718", "readPosition" : "19807:42735", "waitingReadOp" : false, "pendingReadOps" : 0, "messagesConsumedCounter" : -2257, "cursorLedger" : -1, "cursorLedgerLastEntry" : -1, "individuallyDeletedMessages" : "[]", "lastLedgerSwitchTimestamp" : "2022-03-24T20:03:51.85Z", "state" : "Uninitialized", "numberOfEntriesSinceFirstNotAckedMessage" : 744993, "totalNonContiguousDeletedMessagesRange" : 0, "subscriptionHavePendingRead" : true, "subscriptionHavePendingReplayRead" : false, "properties" : { } } } ``` ### Modifications - Correct the cursor state.
* [improve][client] Add log for reader acknowledge. * Fix checkstyle * Change log level to warn
…itive (apache#14802) Let's get this in and unblock flaky tests
…t rid of CVE-2020-36518 (apache#14871) * [fix][security] Upgrade JacksonXML to get rid of CVE-2020-36518 * force jackson-databind version
…cursor (apache#14900) Fixes apache#14880 ### Motivation Non durable cursor was not closed properly. ### Modifications For non durable cursor, `cursor.asyncClose` did nothing. The proper way is `topic.getManagedLedger().asyncDeleteCursor`
…pache#14898) * [C++] Fix send callback might not be invoked in key based batching ### Motivation When C++ client enables key based batching, there is a chance that the send callback is not invoked. See https://github.com/apache/pulsar/blob/32df93f693bfdf42953bd728a12ecdea1796dcc8/pulsar-client-cpp/lib/ProducerImpl.cc#L272-L275 If a batch container has multiple batches, only one batch could be processed during `closeAsync`. Even worse, the semaphores of other batches won't be released. ### Modifications - Add a `clearPendingBatches` method to clear all pending batches and process them. Then call this method in `closeAsync` and `getPendingCallbacksWhenFailed`. - Add a test `testCloseBeforeSend` to verify when a producer has multiple pending batches, all callbacks can be invoked in `closeAsync`. * Add processAndClear() to batch message container
lhotari
force-pushed
the
lh-github-actions-workflow-refactoring-2022-artifacts
branch
from
March 28, 2022 18:35
35672ad
to
fa201e4
Compare
- might exist when re-running
lhotari
force-pushed
the
lh-github-actions-workflow-refactoring-2022-artifacts
branch
from
March 28, 2022 22:38
ffeb60e
to
3c555cb
Compare
lhotari
pushed a commit
that referenced
this pull request
Apr 12, 2022
(cherry picked from commit 09ec366)
lhotari
pushed a commit
that referenced
this pull request
Aug 25, 2022
(cherry picked from commit 09ec366)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CompletableFuture
([improve][broker] reduce code duplication to avoid endless waitCompletableFuture
apache/pulsar#14853)