lh github actions workflow refactoring 2022 artifacts #60

lhotari · 2022-03-28T17:35:16Z

[C++] Fix producer is never destructed until client is closed ([C++] Fix producer is never destructed until client is closed apache/pulsar#14797)
[C++] Fix the race condition of connect timeout task ([C++] Fix the race condition of connect timeout task apache/pulsar#14823)
Skip crowdin download (Skip crowdin download apache/pulsar#14825)
2.8.3 Release Notes (2.8.3 Release Notes apache/pulsar#14632)
[website] Fix script to generate python docs for website ([website] Fix script to generate python docs for website apache/pulsar#14828)
[feature][workflow] Add Pulsar PR Naming Convention ([feature][workflow] Add Pulsar PR Naming Convention apache/pulsar#14811)
[improve][client] Avoid timertask run before previous subscribe complete. ([improve][client] Avoid timer task run before previous subscribe complete. apache/pulsar#14818)
[Java Client] Optimize batch flush scheduling ([Java Client] Optimize batch flush scheduling apache/pulsar#14185)
[fix][txn]: fix transaction log recover throw cursor already close ([fix][txn]: fix transaction log recover throw cursor already close apache/pulsar#14810)
[fix][txn]: fix transaction buffer recover reader and writer fail ([fix][txn]: fix transaction buffer recover reader and writer fail apache/pulsar#14801)
[feature][doc]Add doc for how to use MultiRolesTokenAuthorizationProvider ([feature][doc]Add doc for how to use MultiRolesTokenAuthorizationProvider apache/pulsar#14773)
[C++] Fix segmentation fault when creating socket failed ([C++] Fix segmentation fault when creating socket failed apache/pulsar#14834)
[fix][txn]: fix some exception handle in transaction buffer ([fix][txn]: fix some exception handle in transaction buffer apache/pulsar#14808)
[fix][txn]: fix cannot enable transaction when is allow auto update schema disabled ([fix][txn]: fix cannot enable transaction when is allow auto update schema enabled=fasle apache/pulsar#14809)
[fix][test]: fix flaky test testTransactionBufferRecoverThrowPulsarClientException ([fix][test]: fix flaky test testTransactionBufferRecoverThrowPulsarClientException apache/pulsar#14846)
[improve][tool] Improve transaction perf logs ([improve][tool] Improve transaction perf logs apache/pulsar#14816)
[fix][java-client] Returns immutable data set when use TableView. ([fix][java-client] Returns immutable data set when use TableView. apache/pulsar#14833)
[cleanup][broker] Remove modular load manager unused variable ([cleanup][broker] Remove modular load manager unused variable apache/pulsar#14832)
[fix][broker] filter the virtual NIC with relative path ([fix][broker] filter the virtual NIC with relative path apache/pulsar#14829)
[refactor][test] Move assert equals and retry to a base class ([refactor][test] Move assert equals and retry to a base class apache/pulsar#14815)
Move static python website content; rely on docusaurus to copy content (Rely on docusaurus to copy static python client content to website apache/pulsar#14835)
Download page must not refer to dist.apache.org (Download page must not refer to dist.apache.org apache/pulsar#14855)
[fix][txn]: fix transaction buffer no snapshot close recover reader ([fix][txn]: fix transaction buffer no snapshot close recover reader apache/pulsar#14830)
[fix][txn]: fix transaction buffer recover throw cursor already close ([fix][txn]: fix transaction buffer recover throw cursor already close apache/pulsar#14807)
fix c++ client compile error beacaus of keyword optional is redundant in PaddingDemo.proto ([Issue 14786][c++ client] Fix C++ client compile error beacaus of keyword optional is redundant in PaddingDemo.proto apache/pulsar#14862)
Regenerate keystore files so that JettySslContextFactoryWithAutoRefreshTest passes on JDK 8 ([test] Regenerate keystore files so that JettySslContextFactoryWithAutoRefreshTest passes on JDK 8 apache/pulsar#14860)
[Broker] Fix NPE when subscription is already removed ([Broker] Fix NPE when subscription is already removed apache/pulsar#14363)
[cleanup][broker]: Refactor PulsarAuthorizationProvider. ([cleanup][broker]: Refactor PulsarAuthorizationProvider. apache/pulsar#14747)
[improve][broker] Add log when update namespace policies with error. ([improve][broker] Add log when update namespace policies with error. apache/pulsar#14850)
[fix][broker] Fix topic policy reader close bug. ([fix][broker] Fix topic policy reader close bug. apache/pulsar#14897)
[fix][broker] Fix wrong state for non-durable cursor ([fix][broker] Fix wrong state for non-durable cursor. apache/pulsar#14869)
[improve][broker] reduce code duplication to avoid endless wait CompletableFuture ([improve][broker] reduce code duplication to avoid endless wait CompletableFuture apache/pulsar#14853)
[improve][client] Add log for reader acknowledge. ([improve][client] Add log for reader acknowledge. apache/pulsar#14904)
Upgrade Rocksdb to 6.29.4.1 (Upgrade Rocksdb to 6.29.4.1 apache/pulsar#14886)
Support advertised listeners for HTTP and HTTPS services (Support advertised listeners for HTTP and HTTPS services apache/pulsar#14839)
[owasp] suppress debezium-connector-postgres CVE-2021-23214 false positive ([owasp] suppress debezium-connector-postgres CVE-2021-23214 false positive apache/pulsar#14802)
[fix][security] Upgrade jackson and jackson-databind (2.13.2.1) to get rid of CVE-2020-36518 ([fix][security] Upgrade jackson and jackson-databind (2.13.2.1) to get rid of CVE-2020-36518 apache/pulsar#14871)
[fix][transaction] Properly close transaction-buffer-sub non durable cursor ([fix][transaction] Properly close transaction-buffer-sub non durable cursor apache/pulsar#14900)
[CI] Combine multiple GitHub Actions workflows into a smaller amount of workflows
Use GitHub Actions artifacts for storage instead of GitHub Actions cache

…#14797) Fixes apache#509 ### Motivation When a C++ producer is created successfully, it will start a send timer. However the callback has captured the shared pointer of `ProducerImpl` itself. It extends the lifetime of `ProducerImpl` so that even after the `Producer` object destructs, the underlying `ProducerImpl` object won't be destructed. It could only be destructed after `Client::close()` is called. ### Modifications - Pass a weak pointer of `ProducerImpl` to the send timer and add a `asyncWaitSendTimeout` method for the combination of `expires_from_now` and `async_wait` calls on the timer. - Add `ClientTest.testReferenceCount` to verify the reference count will become 0 after the producer or consumer destructs.

Fixes apache#14665 ### Motivation In C++ client, a connect timeout task is created each time before an asynchronous connect operation is performed, if the connection cannot be established in the configured timeout, the callback of the task will be called to close the connection and then the `createProducer` or `subscribe` methods will return `ResultConnectError`. `ClientConnection::connectTimeoutTask_`, which is a shared pointer, represents the timeout task. However, after `ClientConnection::close` is called, the shared pointer will be reset, and the underlying `PeriodicTask` object will be released. After that, when `stop` method is called on the released `PeriodicTask` object in the callback (`handleTcpConnected`), a segmentation fault will happen. The root cause is that `connectTimeoutTask_` can be accessed in two threads while one of them could release the memory. See apache#14665 for more explanations. This race condition leads to flaky Python tests as well, because we also have the similar test in Python tests. See https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221 So this PR might also fix apache#14714. ### Modifications Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After that, the `connectTimeoutTask_` will always points to the same `PeriodicTask` object, whose methods are thread safe. ### Verifying this change Execute the following command ```bash ./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10 ``` to runs the `testConnectTimeout` for 10 times. In my local env, it never failed, while before applying this patch, it's very easy to fail.

* Only run crowdin-download once a day to avoid timeouts * Add license to website's package.json Master issue: apache#14824 ### Motivation The crowdin translations do not get updated often enough to warrant 3 downloads a day, especially given that the download and upload are currently timing out and preventing the english version of the website to get published. ### Modifications * Make the `crowdin-download` step only run once a day. * Add license to the yarn `package.json`. ### Verifying this change This is a trivial change.

* WIP: 2.8.3 release notes * Refactor titles Co-authored-by: Anonymitaet <[email protected]> * Fix tense; upper case first letters * Correct version; update date Co-authored-by: Anonymitaet <[email protected]>

Fix an incorrectly referenced directory in the website build. I introduced the bug here: apache#14788. See https://github.com/apache/pulsar/runs/5666378801?check_suite_focus=true for sample failure.

…ete. (apache#14818) ### Motivation If configuration ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests. ### Modifications *Describe the modifications you've done.*

* Java client: only schedule batch when there are pending messages * Update variable name; handle reconnect case * Remove code duplication * Fix method name * Improve method name; add tests; fix bugs raised by tests * Remove comment that is now incorrect * Chain batch flush task when task triggers delivery * Prevent early batch flush by storing lastBatchSendNanos * Refactor getPendingQueueSize * Remove unintentionally committed comments * Remove premature optimization to cancle and reschedule batchFlushTask * Ensure buffered batchMessageContainer messages can fail due to send timeout * Fix comment to match new design * Fix batch message counting log line * Guard against null batchMessageContainer * Fix send timeout logic to fire when batching is disabled Master Issue: apache#11100 ### Motivation As observed in apache#11100, the Java client producer consumes cpu even when doing nothing. This is especially true when using many producers. Instead, we can make it so that the batch timer is only scheduled when it has messages to deliver or just fired and delivered messages. If there are concerns about this optimization, I will need to take some time to complete benchmarks. ### Modifications * Skip message batch delivery if the producer is not in `Ready` state. As a consequence, ensure that messages pending in the `batchMessageContainer` are still eligible to fail for `sendTimeout`. * If there is no current batch flush task, schedule a single flush task when a message is added to a batch message container (assuming the message does not also trigger the delivery of the batch and the producer is in READY state). * Schedule another batch flush task if and only if the batch flush task itself was responsible for sending messages. * Keep track of `lastBatchSendNanoTime`, and only flush a batch message container if the `BatchingMaxPublishDelayMicros` time has passed since the last send time. * Note that the timer task is only ever updated within `synchronized (this)` block, so we are guaranteed to have a consistent view free of race conditions. (There is one race condition, and that is that an existing batch timer might get canceled while it is starting. This is of little consequence since it'll result in either no delivery or a small batch.) ### Verifying this change There are existing tests that cover batch message delivery. Specifically, the `BatchMessageTest` in the `pulsar-broker` module covers these changes. ### Does this pull request potentially affect one of the following parts: This is a backwards compatible change. ### Documentation - [x] `no-need-doc`

…pache#14810) ### Motivation When Transactionlog recover fail throw CursorAlreadyClosedException, we should stop the recover op. the cursor was been closed, the transaction log was been closed, so we should stop the recover op, in order to release thread resources like apache#14781 ### Modifications When recover fail by CursorAlreadyClosedException, comeplete recover

…ache#14801) ### Motivation When Transaction buffer recover create reader or create writer fail or read snapshot fail throw PulsarClientException, we should rerecover this topic so close this topic to reinit.

…ider (apache#14773)

### Motivation apache#14823 fixes the flaky `testConnectTimeout` but it's also a regression of apache#14587. Because when the fd limit is reached, the `connectionTimeoutTask_` won't be initialized with a non-null value. Calling `stop` method on it directly will cause segmentation fault. See https://github.com/apache/pulsar/blob/0fe921f32cefe7648ca428cd9861f9163c69767d/pulsar-client-cpp/lib/ClientConnection.cc#L178-L185 ### Modifications Add the null check for `connectionTimeoutTask_` in `ClientConnection::close`.

…4808)

…chema disabled (apache#14809)

…ientException (apache#14846) Co-authored-by: congbo <[email protected]>

…ache#14833)

…e#14832)

…#14815)

apache#14835) Master Issue: apache#14824 ### Motivation In apache#14828, I tried to fix the python client docs. The correction partially succeeded. In looking at the `asf-site`, I noticed that the docs were not in the correct location and they were duplicated. I subsequently learned that docusaurus actually copies all content in the `static` directory to the `build/` output directory. Therefore, there is no need to copy the docs in the `build` script. The docs just have to be output in the correct location. ### Modifications * Move the docs created in apache#14788 from `python-client` to `api/python` so that they will properly render on the website. I already made this change on the `asf-site` branch here to ensure that the current website renders correctly: 53271f1. * Update the python client doc creation script. ### Verifying this change I ran the appropriate scrips to ensure that the site is built and rendered as expected. You can view the current state of the 2.9.1 docs here: https://pulsar.apache.org/api/python/2.9.1/pulsar.html.

Must use https://downloads.apache.org

…pache#14830) ### Motivation now transaction buffer recover no snapshot, we don't close the reader, it will produce the problem of OOM

…apache#14807) ### Motivation When Transaction buffer recover fail throw CursorAlreadyClosedException, we should stop the recover op. the cursor was been closed, the transaction buffer was been closed, so we should stop the recover op, in order to release thread resources like apache#14781

… in PaddingDemo.proto (apache#14862) Co-authored-by: tmytangmingyang <[email protected]>

…shTest passes on JDK 8 (apache#14860)

* [Broker] Fix NPE when subscription is already removed * Cover same case for NonPersistentTopic Master Issue: apache#14362 ### Motivation There is current a race condition when we remove a subscription. The race and how to reproduce it is described in the apache#14362. One of the consequences of the race is that there is a chance we try to remove the subscription from the topic twice. This leads to an NPE, as described in the issue. ### Modifications * Verify that the `sub` is not null before getting its stats. ### Verifying this change This is a trivial change.

…pache#14850)

### Motivation apache#14896 is flaky, after diving into the codes, I find it's a bug about closing topic policy reader. We should use `ex.getCause` instead of `ex`. Stacktrace： ``` 2022-03-27T10:42:16,795+0800 [broker-client-shared-internal-executor-58-1] WARN org.apache.pulsar.broker.service.SystemTopicBasedTopicPoliciesService - Read more topic polices exception, read again. java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$AlreadyClosedException: Consumer already closed at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628) ~[?:1.8.0_291] at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996) ~[?:1.8.0_291] at org.apache.pulsar.client.impl.MultiTopicsReaderImpl.readNextAsync(MultiTopicsReaderImpl.java:140) ~[classes/:?] ```

### Motivation The current non-durable cursor does not have the correct state. For example, when the reader is created, I always see the cursor status as ``Uninitialized`` via the ``getInternalStats`` method. ```json { "reader-xxxxx" : { "markDeletePosition" : "19785:18718", "readPosition" : "19807:42735", "waitingReadOp" : false, "pendingReadOps" : 0, "messagesConsumedCounter" : -2257, "cursorLedger" : -1, "cursorLedgerLastEntry" : -1, "individuallyDeletedMessages" : "[]", "lastLedgerSwitchTimestamp" : "2022-03-24T20:03:51.85Z", "state" : "Uninitialized", "numberOfEntriesSinceFirstNotAckedMessage" : 744993, "totalNonContiguousDeletedMessagesRange" : 0, "subscriptionHavePendingRead" : true, "subscriptionHavePendingReplayRead" : false, "properties" : { } } } ``` ### Modifications - Correct the cursor state.

…pletableFuture`` (apache#14853)

* [improve][client] Add log for reader acknowledge. * Fix checkstyle * Change log level to warn

…itive (apache#14802) Let's get this in and unblock flaky tests

…t rid of CVE-2020-36518 (apache#14871) * [fix][security] Upgrade JacksonXML to get rid of CVE-2020-36518 * force jackson-databind version

…cursor (apache#14900) Fixes apache#14880 ### Motivation Non durable cursor was not closed properly. ### Modifications For non durable cursor, `cursor.asyncClose` did nothing. The proper way is `topic.getManagedLedger().asyncDeleteCursor`

…pache#14898) * [C++] Fix send callback might not be invoked in key based batching ### Motivation When C++ client enables key based batching, there is a chance that the send callback is not invoked. See https://github.com/apache/pulsar/blob/32df93f693bfdf42953bd728a12ecdea1796dcc8/pulsar-client-cpp/lib/ProducerImpl.cc#L272-L275 If a batch container has multiple batches, only one batch could be processed during `closeAsync`. Even worse, the semaphores of other batches won't be released. ### Modifications - Add a `clearPendingBatches` method to clear all pending batches and process them. Then call this method in `closeAsync` and `getPendingCallbacksWhenFailed`. - Add a test `testCloseBeforeSend` to verify when a producer has multiple pending batches, all callbacks can be invoked in `closeAsync`. * Add processAndClear() to batch message container

…of workflows

- might exist when re-running

…jobs fail

(cherry picked from commit 09ec366)

BewareMyPower and others added 30 commits March 22, 2022 23:35

2.8.3 Release Notes (apache#14632)

f0a1417

* WIP: 2.8.3 release notes * Refactor titles Co-authored-by: Anonymitaet <[email protected]> * Fix tense; upper case first letters * Correct version; update date Co-authored-by: Anonymitaet <[email protected]>

[feature][workflow] Add Pulsar PR Naming Convention (apache#14811)

508a7e3

[improve][client] Avoid timertask run before previous subscribe compl…

0fe921f

…ete. (apache#14818) ### Motivation If configuration ``patternAutoDiscoveryPeriod`` is small, there may be some unnecessary subscribe requests. ### Modifications *Describe the modifications you've done.*

[fix][txn]: fix transaction buffer recover reader and writer fail (ap…

bf56863

…ache#14801) ### Motivation When Transaction buffer recover create reader or create writer fail or read snapshot fail throw PulsarClientException, we should rerecover this topic so close this topic to reinit.

[feature][doc]Add doc for how to use MultiRolesTokenAuthorizationProv…

3c02701

…ider (apache#14773)

[fix][txn]: fix some exception handle in transaction buffer (apache#1…

4824912

…4808)

[fix][txn]: fix cannot enable transaction when is allow auto update s…

b5b0967

…chema disabled (apache#14809)

[fix][test]: fix flaky test testTransactionBufferRecoverThrowPulsarCl…

dca5a90

…ientException (apache#14846) Co-authored-by: congbo <[email protected]>

[improve][tool] Improve transaction perf logs (apache#14816)

f74a586

[fix][java-client] Returns immutable data set when use TableView. (ap…

c97a2d3

…ache#14833)

[cleanup][broker] Remove modular load manager unused variable (apach…

c08f676

…e#14832)

[fix][broker] filter the virtual NIC with relative path (apache#14829)

9e919c7

[refactor][test] Move assert equals and retry to a base class (apache…

d0d9aa5

…#14815)

Download page must not refer to dist.apache.org (apache#14855)

9644448

Must use https://downloads.apache.org

[fix][txn]: fix transaction buffer no snapshot close recover reader (a…

7a78b50

…pache#14830) ### Motivation now transaction buffer recover no snapshot, we don't close the reader, it will produce the problem of OOM

fix c++ client compile error beacaus of keyword optional is redundant…

0506fc6

… in PaddingDemo.proto (apache#14862) Co-authored-by: tmytangmingyang <[email protected]>

Regenerate keystore files so that JettySslContextFactoryWithAutoRefre…

69d2211

…shTest passes on JDK 8 (apache#14860)

[cleanup][broker]: Refactor PulsarAuthorizationProvider. (apache#14747)

32df93f

[improve][broker] Add log when update namespace policies with error. (a…

195b13c

…pache#14850)

mattisonchao and others added 9 commits March 28, 2022 16:43

[improve][broker] reduce code duplication to avoid endless wait ``Com…

1631bf1

…pletableFuture`` (apache#14853)

[improve][client] Add log for reader acknowledge. (apache#14904)

edac5ff

* [improve][client] Add log for reader acknowledge. * Fix checkstyle * Change log level to warn

Upgrade Rocksdb to 6.29.4.1 (apache#14886)

12faf2d

Support advertised listeners for HTTP and HTTPS services (apache#14839)

3f66c26

[owasp] suppress debezium-connector-postgres CVE-2021-23214 false pos…

d03e2d3

…itive (apache#14802) Let's get this in and unblock flaky tests

[fix][security] Upgrade jackson and jackson-databind (2.13.2.1) to ge…

6d9ba7b

…t rid of CVE-2020-36518 (apache#14871) * [fix][security] Upgrade JacksonXML to get rid of CVE-2020-36518 * force jackson-databind version

lhotari force-pushed the lh-github-actions-workflow-refactoring-2022-artifacts branch from 35672ad to fa201e4 Compare March 28, 2022 18:35

lhotari added 5 commits March 28, 2022 22:46

[CI] Combine multiple GitHub Actions workflows into a smaller amount …

0fdf6e6

…of workflows

Delete artifacts after successful build

ffaabd2

Delete possible previous artifact

8a02636

- might exist when re-running

Don't upload surefire reports and container logs to artifacts unless …

9e1a3d8

…jobs fail

Cleanup log artifacts from possible previous failed build

3c555cb

lhotari force-pushed the lh-github-actions-workflow-refactoring-2022-artifacts branch from ffeb60e to 3c555cb Compare March 28, 2022 22:38

Workaround limitation with matrix jobs

75618b9

lhotari closed this Mar 30, 2022

lhotari pushed a commit that referenced this pull request Apr 12, 2022

Include circe-checksum in pulsar-client dependency (#60)

29a9bb5

(cherry picked from commit 09ec366)

lhotari pushed a commit that referenced this pull request Aug 25, 2022

Include circe-checksum in pulsar-client dependency (#60)

f2f6e60

(cherry picked from commit 09ec366)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lh github actions workflow refactoring 2022 artifacts #60

lh github actions workflow refactoring 2022 artifacts #60

lhotari commented Mar 28, 2022

lh github actions workflow refactoring 2022 artifacts #60

lh github actions workflow refactoring 2022 artifacts #60

Conversation

lhotari commented Mar 28, 2022