Branch 2.5 All test #14

…ation. (apache#5572) ### Motivation Currently, disable the topic auto creation will cause consumer create failed on a partitioned topic. Since the partitioned topic is already created, so we should handle the topic partition create when disable the topic auto creation. ### Modifications By default, create partitioned topics also try to create all partitions, and if create partitions failed, users can use `create-missed-partitions` to repair. If users already have a partitioned topic without created partitions, can also use `create-missed-partitions` to repair. (cherry picked from commit 602f1c2)

* Fixed static linking on C++ lib on MacOS * Use `-undefined dynamic_lookup` when linking on Mac to not include python's own runtime * Fixed searching for protobuf (cherry picked from commit 125a588)

…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (cherry picked from commit 0a259ab)

### Motivation Expose bookkeeper expose explicit lac configuration in broker.conf It's related to apache#3828 apache#4976, some Pulsar SQL users need to enable the explicitLacInterval, so that they can get the last message in Pulsar SQL. (cherry picked from commit 4fd17d4)

…ache#5836) *Motivation* pulsar-client-kafka-compact depends on pulsar-client implementation hence it pulls in protobuf dependencies. This results in `class file for com.google.protobuf.GeneratedMessageV3 not found` errors when generating javadoc for those modules. *Modifications* Skip javadoc tasks for these modules. Because: - pulsar-client-kafka-compact is a kafka wrapper. Kafka already provides javadoc for this API. - we didn't publish the javadoc for this module. (cherry picked from commit 97f9431)

(cherry picked from commit d1d5cf7)

…pache#5915) * Allow to enable/disable delyed delivery for messages on namespace Signed-off-by: xiaolong.ran <[email protected]> * add isDelayedDeliveryEnabled function Signed-off-by: xiaolong.ran <[email protected]> * add delayed_delivery_time process logic Signed-off-by: xiaolong.ran <[email protected]> * add test case Signed-off-by: xiaolong.ran <[email protected]> * update admin cli docs Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * update import lib Signed-off-by: xiaolong.ran <[email protected]> * avoid import * Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * remove unuse code Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * add test case for delayed delivery messages Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> * fix comments Signed-off-by: xiaolong.ran <[email protected]> (cherry picked from commit f0d339e)

Fixes apache#5755 ### Motivation Fix negative un-ack messages in consumer stats while set maxUnackedMessagesPerConsumer=0 ### Verifying this change Added unit test (cherry picked from commit 9d94860)

### Motivation Currently, Pulsar uses Avro 1.8.2, a version released two years ago. The latest version of Avro is 1.9.1, which uses FasterXML's Jackson 2.x instead of Codehaus's Jackson 1.x. Jackson is prone to security issues, so we should not keep using older versions. https://blog.godatadriven.com/apache-avro-1-9-release ### Modifications Avro 1.9 has some major changes: - The library used to handle logical datetime values has changed from Joda-Time to JSR-310 (apache/avro#631) - Namespaces no longer include "$" when generating schemas containing inner classes using ReflectData (apache/avro#283) - Validation of default values has been enabled (apache/avro#288). This results in a validation error when parsing the following schema: ```json { "name": "fieldName", "type": [ "null", "string" ], "default": "defaultValue" } ``` The default value of a nullable field must be null (cf. https://issues.apache.org/jira/browse/AVRO-1803), and the default value of the field as above is actually null. However, this PR disables the validation in order to maintain the traditional behavior. (cherry picked from commit d6f240e)

…5942) ### Motivation Avoid using same OpAddEntry between different ledger handles. ### Modifications Add state for OpAddEntry, if op handled by new ledger handle, the op will set to CLOSED state, after the legacy callback happens will check the op state, only INITIATED can be processed. When ledger rollover happens, pendingAddEntries will be processed. when process pendingAddEntries, will create a new OpAddEntry by the old OpAddEntry to avoid different ledger handles use same OpAddEntry. (cherry picked from commit 7ec17b2)

…itioned topic (apache#5943) ### Motivation Currently, it is not possible to create a partitioned topic with the same name as an existing non-partitioned topic, but the reverse is possible. ``` $ ./bin/pulsar-admin topics create persistent://public/default/t1 $ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t1 16:12:50.418 [AsyncHttpClient-5-1] WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://localhost:8080/admin/v2/persistent/public/default/t1/partitions] Failed to perform http put request: javax.ws.rs.ClientErrorException: HTTP 409 Conflict This topic already exists Reason: This topic already exists $ ./bin/pulsar-admin topics create-partitioned-topic -p 2 persistent://public/default/t2 $ ./bin/pulsar-admin topics create persistent://public/default/t2 $ ./bin/pulsar-admin topics list public/default "persistent://public/default/t2" "persistent://public/default/t1" $ ./bin/pulsar-admin topics list-partitioned-topics public/default "persistent://public/default/t2" ``` These non-partitioned topics are not available and should not be created. ### Modifications When creating a non-partitioned topic, "409 Conflict" error will be returned if a partitioned topic with the same name already exists. (cherry picked from commit 7fd3f70)

…oducer (apache#5988) * [pulsar-broker] Clean up closed producer to avoid publish-time for producer * fix test cases (cherry picked from commit 0bc54c5)

### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change (cherry picked from commit 275854e)

…apache#6051) --- Master Issue: apache#6046 *Motivation* Make people can use the timestamp to tell if acknowledge and consumption are happening. *Modifications* - Add lastConsumedTimestamp and lastAckedTimestamp to consume stats *Verify this change* - Pass the test `testConsumerStatsLastTimestamp` (cherry picked from commit 5728977)

(cherry picked from commit 56280ea)

(cherry picked from commit c90854a)

* PIP-55: Refresh Authentication Credentials * Fixed import order * Do not check for original client credential if it's not coming through proxy * Fixed import order * Fixed mocked test assumption * Addressed comments * Avoid to print NPE on auth refresh check if auth is disabled (cherry picked from commit 4af5223)

Motivation Message redelivery is not work well with zero queue consumer when using receive() or listeners to consume messages. This pull request is try to fix it. Modifications Add missed trackMessage() method call at zero queue size consumer. Verifying this change New unit tests added. (cherry picked from commit 787bee1)

### Motivation Currently, pulsar support delete inactive topic which has no active producers and no subscriptions. This pull request is support to delete inactive topics that all subscriptions of the topic are caught up and no active producers/consumer. ### Modifications Expose inactive topic delete mode in broker.conf, future more we can support namespace level configuration for the inactive topic delete mode. (cherry picked from commit dc7abd8)

…component (apache#6078) ### Motivation Some users may confuse by pulsar/bookie log without flushing immediately. ### Modifications Add a message in `bin/pulsar-daemon` when starting a component. (cherry picked from commit 4f461c3)

…to receive messages (apache#6090) Fix message redelivery for zero queue consumer while using async api to receive messages (cherry picked from commit d5fca06)

…ng (apache#6101) *Motivation* Related to apache#6084 apache#5400 introduces `customRuntimeOptions` in function details. But the description was wrong. The mistake was probably introduced by bad merges. *Modification* Fix the argument and description for `deadletterTopic` and `customRuntimeOptions`. (cherry picked from commit c6e258d)

) *Motivation* Fixes apache#5997 Fixes apache#6079 A regression was introduced in apache#5486. If websocket service as running as part of pulsar standalone, the cluster data is set with null service urls. This causes service url is not set correctly in the pulsar client and an illegal argument exception ("Param serviceUrl must not be blank.") will be thrown. *Modifications* 1. Pass `null` when constructing the websocket service. So the local cluster data can be refreshed when creating pulsar client. 2. Set the cluster data after both broker service and web service started and ports are allocated. (cherry picked from commit 49a9897)

### Motivation Available permits of ZeroQueueConsuemer must be 1 or less, however ZeroQueueConsuemer using listener may be greater than 1. ### Modifications If listener is processing message, ZeroQueueConsumer doesn't send permit when it reconnect to broker. ### Reproduction 1. ZeroQueueConsuemer using listener consume a topic. 2. Unload that topic( or restart a broker) when listener is processing message. 3. ZeroQueueConsumer sends permit when it reconnect to broker. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L133 4. ZeroQueueConsumer also sends permit when finished processing message. https://github.com/apache/pulsar/blob/v2.5.0/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ZeroQueueConsumerImpl.java#L163 5. Available permits become 2. (cherry picked from commit c09314c)

`ManagedCursorImpl.asyncResetCursor` is used in three kinds of circumstances: - REST API: create a subscription with messageId. Per the document: Reset subscription to message position closest to given position. - REST API: reset subscription to a given position: Per the document: Reset subscription to message position closest to given position. - Consumer seek command. In all the cases above, when the user provides a MessageId, we should make the best effort to find the closest position, instead of throwing an InvalidCursorPosition Exception. This is because if a user provids an invalid position, it's not possible for he or she gets a valid position, since ledger ids for a given topic may not be continuous and only brokers are aware of the order. Therefore, we should avoid throw invalid cursor position but find the nearest position and do the reset stuff. (cherry picked from commit d2f37a7)

…pache#6122) ### Motivation In apache#2981, we have added support to grant subscriber-permission to manage subscription based apis. However, grant-subscription-permission api requires super-user access and it creates too much dependency on system-admin when many tenants want to grant subscription permission. So, allow each tenant to manage subscription permission in order to reduce administrative efforts for super user. (cherry picked from commit 254e54b)

…ng stuck (apache#6124) (cherry picked from commit d42cfa1)

when broker create the inside client, it sets tlsTrustCertsFilePath as "getTlsCertificateFilePath()", but it should be "getBrokerClientTrustCertsFilePath()" (cherry picked from commit 1fcccd6)

### Motivation When a broker is under heavy load, the following log may be output and some topics may be unloaded. > Attempting to shed load on broker101.pulsar.xxx.yahoo.co.jp:4080, which has max resource usage above threshold 0.8708186149597168% > 0.85% -- Offloading at least 0.36224863451117845 MByte/s of traffic This log means that the usage rate of CPU, memory, direct memory, input bandwidth, or output bandwidth has exceeded the threshold, but we don't know which resource usage is high. ### Modifications Output these resource usages along with the above log. > Attempting to shed load on broker101.pulsar.xxx.yahoo.co.jp:4080, which has resource usage 87.08% above threshold 85.0% -- Offloading at least 0.36224863451117845 MByte/s of traffic (cpu: 87.08%, memory: 12.71%, directMemory: 17.19%, bandwidthIn: 11.28%, bandwidthOut: 0.00%) (cherry picked from commit 9b296d8)

apache#6158) ### Motivation Fixes apache#5994: If the proxy service comes up before the brokers are up and reachable there will be HTTP 403 when running `bin/pulsar-admin` commands from inside the proxy pod. The proxy will also not be able to connect to the brokers when data is pushed through binary port with the following error: ```bash Caused by: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available ... 14 more Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available 22:11:07.633 [pulsar-web-32-6] INFO org.eclipse.jetty.server.RequestLog - 172.17.0.6 - - [24/Jan/2020:22:11:07 +0000] "PUT /admin/v2/persistent/public/functions/assignments HTTP/1.1" 500 2528 "-" "Pulsar-Java-v2.5.0" 280 ``` #### Workaround: Restart the proxy pods once brokers pods are running #### Proposed solution: Hold off starting of the proxies until at least one broker is reachable in the cluster. ### Modifications Changes are inside `proxy-deployment.yaml` helm template file that defines a new init container before proxy is started. The init container waits until broker is reachable using the nslookup on the broker service with a sleep of 30 seconds between retries and up to number of brokers times. Alternative solution that doesn't always work was `'until nslookup broker-service; sleep 2; done;', but 403 would still sometimes (could have been a fluke, but I saw it happening once). ### Verifying this change 1) Follow the instructions on how deploying helm and run: `helm install pulsar --values pulsar/values-mini.yaml ./pulsar/`. 2) Wait until all the services are up and running. 3) Connect to proxy pod and run `bin/pulsar-admin broker-stats monitoring-metrics` - no 403 or permission errors should arise 4) Set up tenant, namespace 5) Push data into a topic - No errors in the proxy logs and client is able to push data into cluster through proxies (cherry picked from commit b838c59)

### Motivation if you deploy pulsar using the helm chart and disable monitoring with ``` extras: dashboard: no ``` but you have the ingress of the dashboard set to true ``` dashboard: ingress: enabled: true ``` the helm chart will create an ingress that points to a non-existing service because the dashboard itself was not deployed. ### Modifications I've added the same check that is already in place in dashboard-service and dashboard-deployment ### Verifying this change I dont know of any automated tests, i tested it manually. In the end it's the same "if" that is already in place in dashboard-service and dashboard-deployment ### Does this pull request potentially affect one of the following parts: Affects deployment via helm chart. An unwanted ingress object is suppressed. ### Documentation no documentation need (cherry picked from commit efee516)

Co-authored-by: Sijie Guo <[email protected]> (cherry picked from commit 9b46930)

### Motivation Currently, bundle split splits the bundle into two parts of the same size. When there are fewer topics, bundle split does not work well. The topic assigned to the broker according to the topic name hash value, hashing is not effective in a small number of topics bundle split. So, this PR introduces an option(-balance-topic-count) for bundle split. When setting it to true, the given bundle splits to 2 parts, each part has the same amount of topics. And introduce a new Load Manager implementation named `org.apache.pulsar.broker.loadbalance.impl.BalanceTopicCountModularLoadManager`. The new Load Manager implementation splits bundle with balance topics count, others are not different from ModularLoadManagerImpl. (cherry picked from commit 1c099da)

…er OOM (apache#6178) Motivation Introduce maxMessagePublishBufferSizeInMB configuration to avoid broker OOM. Modifications If the processing message size exceeds this value, the broker will stop read data from the connection. When available size > half of the maxMessagePublishBufferSizeInMB, start auto-read data from the connection. (cherry picked from commit 91dfa1a)

### Motivation Currently, the offload operation only have the cluster level configuration, can't set the offload configuration at the namespace level, it's inflexible. ### Modifications Add the namespace offload policies. (cherry picked from commit fd03be5)

…pache#6187) Fixes apache#5904 ### Motivation Pulsar supports unload a non-partitioned-topic or a partition of a partitioned topic. If there has a partitioned topic with too many partitions, users need to get all partition and unload them one by one. We need to support unload all partition of a partitioned topic. (cherry picked from commit d35e6c1)

…pache#6189) ### Motivation Create managed ledger path on local zookeeper when creating partitions for a partitioned topic. ### Modifications Change globalZk() to localZk() when creating partitions. ### Verifying this change PartitionCreationTest can cover this change, since we use the same zookeeper for the unit test in ProducerConsumerBase, so the test passed before. (cherry picked from commit 43d89f2)

Motivation Corrected the method name for source implementation in io-develop.md (cherry picked from commit 46bc412)

### Motivation Supplied Kubernetes yaml's for AWS are outdated and just don't work. ### Modifications Update yaml files and so that appying them on AWS EKS will actually set up a working Pulsar environment. ### Verifying this change This change is a trivial rework / code cleanup without any test coverage. (cherry picked from commit d631156)

### Motivation Fix get schema version in HttpLookupService. The com.yahoo.sketches.Util.bytesToLong method need to flip the byte[]. Otherwise, will get a wrong long value. So use ByteBuffer to convert byte[] version to long. This issue will happens when users use http protocol client and multiple version schemas. ### Verifying this change New tests added for HttpLookupService and BinaryLookupService. (cherry picked from commit 44dd412)

### Motivation Currently the version pinning for `netty-transport-native-epoll` is not including the native library artifact. That results, depending on the Maven version, to be picking up an earlier version of `transport-native-epoll-4.1.33.Final-linux-x86_64.jar`, where the version is 4.1.33 as opposed to 4.1.43 which is the correct expected version. This results in using Java NIO based transport instead of the more effiecient/performant epoll based one. This affects 2.5.0 as well. (cherry picked from commit 857d63b)

…ache#6201) ### Motivation Fixes apache#6131 (caused by apache#5675): When upgrading an existing 2.4.1 bookie cluster to 2.5.0 on kubernetes, the bookie fails to start with the following exception during initialization: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2147483648, max: 2147483648). This is caused by the fact that the bookie environment variables `BOOKIE_MEM` and `BOOKIE_FC` defined in conf/bkenv.sh has no effect, and it is always using the defaults values. #### Proposed solution: Set `BOOKIE_MEM` and `BOOKIE_GC` in the helm deployments charts and default to `PULSAR_MEM` if the `BOOKIE` settings are not set and then use the default settings if none of those environment variables are set. #### Changes made Helm chart deployment `values.yaml` and `values-mini.yaml` along with the `bkenv.sh` configuration script. ### Documentation Currently, the documentation explaining the deployment process and how to change settings is lacking and need to be updated. (cherry picked from commit 28875d5)

…6203) In 2.4.x, when running with the KubernetesRuntime, it default to always using the KubernetesSecretAuthProvider class. With the change in 2.5 to making this behavior pluggable, there is currently a bug in that it doesn't keep this behavior and requires a new configuration option to be passed. This commit changes the config so that it defaults to the correct class when we are running with a kubernetes runtime. This restores the behavior match that of earlier versions This also moves the WorkerConfig test to the same package where the workerConfig resides after the refactor and re-arranges the resources files and copied via a maven task Co-authored-by: Addison Higham <[email protected]> (cherry picked from commit 3a3174b)

…d due to TTL (apache#6211) Fixes apache#5579 ### Motivation In Pulsar 2.4.1 and later versions, if message TTL is enabled, `PersistentMessageExpiryMonitor` always deletes one non-expired message every 5 minutes. The cause of this bug is apache#4744. `PersistentMessageExpiryMonitor` expects `ManagedCursor#asyncFindNewestMatching()` to pass null as its found position to itself as a callback if no expired messages exist. https://github.com/apache/pulsar/blob/c5ba52983fee994de61984aae7d1757e9b738caf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageExpiryMonitor.java#L119-L130 However, due to the change in apache#4744, if no entry is found that matches the search condition, the callback will be passed `startPosition` instead of null now. For this reason, the earliest backlog message is always deleted by `PersistentMessageExpiryMonitor`. This means that unexpected message loss can occur. ### Modifications Revert the apache#4744 changes. The motivation of apache#4744 is to avoid NPE caused in pulse-sql, but that seems to be fixed in apache#4757. https://github.com/apache/pulsar/blob/2069f761753940ed6a1faca8999af70036f20fd6/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarSplitManager.java#L363-L382 (cherry picked from commit 54b39e6)

Currently, binary connects aren't checked to see if they provide a token. This results in a NPE in the JWT validation as well as a whole bunch of log spam. By explictly checking for a null/empty token here, we can avoid some exceptions and clean up log spam. Co-authored-by: Addison Higham <[email protected]> (cherry picked from commit 00ce81f)

…6235) (cherry picked from commit 4018d0b)

…pic/last deletion (apache#6237) Fixes apache#6173 ### Motivation Fixes problems for log compaction found in issue apache#6173 : 1. Compaction fails for an empty topic. 2. Compaction never ends if the value of the last message is an empty batch message when the compaction is triggered. 3. Compaction fails for a topic with batch messages because RawReader flow control doesn't handle batch messages properly. ### Modifications 1. Check if any message is available before compaction phases, and finish the compaction immediately if there is no messages to read to avoid timeout exception. 2. Add missing check for empty batch message for the condition to end the phase 2 loop. 3. Increase correct number of available permits in RawConsumer for batch messages. ### Verifying this change Producing messages in both batch and not-batch mode in corresponding tests. (cherry picked from commit d3f6c55)

### Motivation In pulsar 2.5.0 deploying window functions fails because its class doesn't pass validation. The behavior looks the same in current master. ### Modifications Add `WindowFunction.class` to the list of allowed function classes (cherry picked from commit 47b944b)

) ### Motivation In the C++ test CI jobs there are spurious tests failing with segfaults. Analyzing the test execution with valgrind it's possible to see that the thread that is running the boost asio event loop is accessing the `io_service` after that already got destroyed. To ensure that the `io_service` is always valid until the thread exists, we pass a `shared_ptr` so that will ensure the liveness. Example of valgrind errors: ``` ==10034== Invalid read of size 4 ==10034== at 0x4BCB784: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:40) ==10034== by 0x4BCB784: pthread_mutex_unlock (pthread_mutex_unlock.c:357) ==10034== by 0x197DB9: boost::asio::detail::posix_mutex::unlock() (posix_mutex.hpp:58) ==10034== by 0x199492: boost::asio::detail::conditionally_enabled_mutex::scoped_lock::~scoped_lock() (conditionally_enabled_mutex.hpp:66) ==10034== by 0x4F03895: boost::asio::detail::scheduler::run(boost::system::error_code&) (scheduler.ipp:151) ==10034== by 0x4F03F8B: boost::asio::io_context::run() (io_context.ipp:62) ==10034== by 0x4FDE872: pulsar::ExecutorService::startWorker(std::shared_ptr<boost::asio::io_context>) (ExecutorService.cc:39) ==10034== by 0x4FE99A3: void std::__invoke_impl<void, void (pulsar::ExecutorService::*&)(std::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*&, decltype(nullptr)&>(std::__invoke_memfun_deref, void (pulsar::ExecutorService::*&)(std::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*&, decltype(nullptr)&) (invoke.h:73) ==10034== by 0x4FE986D: std::__invoke_result<void (pulsar::ExecutorService::*&)(std::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*&, decltype(nullptr)&>::type std::__invoke<void (pulsar::ExecutorService::*&)(std::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*&, decltype(nullptr)&>(void (pulsar::ExecutorService::*&)(std::shared_ptr<boost::asio::io_context>), pulsar::ExecutorService*&, decltype(nullptr)&) (invoke.h:95) ==10034== by 0x4FE9767: void std::_Bind<void (pulsar::ExecutorService::*(pulsar::ExecutorService*, decltype(nullptr)))(std::shared_ptr<boost::asio::io_context>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (functional:400) ==10034== by 0x4FE94A0: void std::_Bind<void (pulsar::ExecutorService::*(pulsar::ExecutorService*, decltype(nullptr)))(std::shared_ptr<boost::asio::io_context>)>::operator()<, void>() (functional:484) ==10034== by 0x4FE9095: boost::asio::detail::posix_thread::func<std::_Bind<void (pulsar::ExecutorService::*(pulsar::ExecutorService*, decltype(nullptr)))(std::shared_ptr<boost::asio::io_context>)> >::run() (posix_thread.hpp:86) ==10034== by 0x4F03E00: boost_asio_detail_posix_thread_function (posix_thread.ipp:74) ==10034== Address 0x8896d08 is 72 bytes inside a block of size 240 free'd ==10034== at 0x483BFBF: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==10034== by 0x1A0001: boost::asio::detail::scheduler::~scheduler() (scheduler.hpp:38) ==10034== by 0x198E5B: boost::asio::detail::service_registry::destroy(boost::asio::execution_context::service*) (service_registry.ipp:110) ==10034== by 0x198D94: boost::asio::detail::service_registry::destroy_services() (service_registry.ipp:54) ==10034== by 0x199294: boost::asio::execution_context::destroy() (execution_context.ipp:46) ==10034== by 0x199222: boost::asio::execution_context::~execution_context() (execution_context.ipp:35) ==10034== by 0x19B90F: boost::asio::io_context::~io_context() (io_context.ipp:55) ==10034== by 0x1B3B7F: std::_Sp_counted_ptr<boost::asio::io_context*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:377) ==10034== by 0x1A283B: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:155) ==10034== by 0x19EC34: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730) ==10034== by 0x19D123: std::__shared_ptr<boost::asio::io_context, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169) ==10034== by 0x19D143: std::shared_ptr<boost::asio::io_context>::~shared_ptr() (shared_ptr.h:103) ==10034== Block was alloc'd at ==10034== at 0x483AE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==10034== by 0x19B7DA: boost::asio::io_context::io_context() (io_context.ipp:38) ==10034== by 0x4FDE622: pulsar::ExecutorService::ExecutorService() (ExecutorService.cc:31) ==10034== by 0x4FE871C: void __gnu_cxx::new_allocator<pulsar::ExecutorService>::construct<pulsar::ExecutorService>(pulsar::ExecutorService*) (new_allocator.h:147) ==10034== by 0x4FE8570: void std::allocator_traits<std::allocator<pulsar::ExecutorService> >::construct<pulsar::ExecutorService>(std::allocator<pulsar::ExecutorService>&, pulsar::ExecutorService*) (alloc_traits.h:484) ==10034== by 0x4FE807F: std::_Sp_counted_ptr_inplace<pulsar::ExecutorService, std::allocator<pulsar::ExecutorService>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<>(std::allocator<pulsar::ExecutorService>) (shared_ptr_base.h:548) ==10034== by 0x4FE77AD: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<pulsar::ExecutorService, std::allocator<pulsar::ExecutorService>>(pulsar::ExecutorService*&, std::_Sp_alloc_shared_tag<std::allocator<pulsar::ExecutorService> >) (shared_ptr_base.h:679) ==10034== by 0x4FE6E3F: std::__shared_ptr<pulsar::ExecutorService, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<pulsar::ExecutorService>>(std::_Sp_alloc_shared_tag<std::allocator<pulsar::ExecutorService> >) (shared_ptr_base.h:1344) ==10034== by 0x4FE62C8: std::shared_ptr<pulsar::ExecutorService>::shared_ptr<std::allocator<pulsar::ExecutorService>>(std::_Sp_alloc_shared_tag<std::allocator<pulsar::ExecutorService> >) (shared_ptr.h:359) ==10034== by 0x4FE4DEF: std::shared_ptr<pulsar::ExecutorService> std::allocate_shared<pulsar::ExecutorService, std::allocator<pulsar::ExecutorService>>(std::allocator<pulsar::ExecutorService> const&) (shared_ptr.h:702) ==10034== by 0x4FE3608: std::shared_ptr<pulsar::ExecutorService> std::make_shared<pulsar::ExecutorService>() (shared_ptr.h:718) ==10034== by 0x4FDEFBD: pulsar::ExecutorServiceProvider::get() (ExecutorService.cc:90) ``` (cherry picked from commit 8262ad9)

…pache#6272) When handling a "timer cancelled" event, we cannot lock the mutex since the object itself might already be destroyed. This causes potentially a memory corruption/segfault. (cherry picked from commit 54a5195)

…created/updated (apache#6275) (cherry picked from commit 4264b8d)

…g Pulsar client (apache#6277) * Attempt at fixing deadlock during client.close() * Fixed formatting * Detach the worker thread in the destructor of ExecutorService if it is still unable to be joined * Possible formatting fixes (cherry picked from commit 2e1c74a)

(cherry picked from commit 50d3599)

…che#6310) Fixes apache#6045 apache#6281 ### Motivation Enable get precise backlog and backlog without delayed messages. ### Verifying this change Added new unit tests for the change. (cherry picked from commit df15210)

* Fixed casting in ZooKeeperCache.getDataIfPresent() * Missed null check (cherry picked from commit 7cade48)

Fixes apache#5560 ### Motivation Currently, Pulsar SQL can't read the keyValue schema data. This PR added support Pulsar SQL reading messages with a key-value schema. ### Modifications Add KeyValue schema support for Pulsar SQL. Add prefix __key. for the key field name. (cherry picked from commit 3cf6be1)

…ial duplicated messages and non-duplicated messages into a batch. (apache#6326) Fixes apache#6273 Motivation The main reason for apache#6273 is combining potential duplicated messages and non-duplicated messages into a batch. So need to flush the potential duplicated message first and then add the non-duplicated messages to a batch. (cherry picked from commit b898f49)

Upgrade ZK to latest stable version. In particular we need to include: - Split brain on log disk full https://issues.apache.org/jira/browse/ZOOKEEPER-3701 - Data loss after upgrading standalone ZK server 3.4.14 to 3.5.6 with snapshot.trust.empty=true https://issues.apache.org/jira/browse/ZOOKEEPER-3644 (cherry picked from commit 5a8f420)

* Corrected method of specifying Windows path to LLVM tools * Fixing windows build * Corrected the dll install path * Fixing pulsarShared paths (cherry picked from commit 9b9e79e)

…pache#6337) Currently, SubscriptionMode is a parameter to create ConsumerImpl, but it is not exported out, and user could not set this value for consumer. This change tries to make SubscriptionMode a member of ConsumerConfigurationData, so user could set this parameter when create consumer. (cherry picked from commit 208af7c)

apache#6339) Motivation To avoid get partition metadata while the topic name is a partition name. Currently, if users want to skip all messages for a partitioned topic or unload a partitioned topic, the broker will call get topic metadata many times. For a topic with the partition name, it is not necessary to call get partitioned topic metadata again. (cherry picked from commit 26d569b)

…aml (apache#6340) Fixes apache#6338 ### Motivation This commit started while I was using helm in my local minikube, noticed that there's a mismatch between `values-mini.yaml` and `values.yaml` files. At first I thought it was a copy/paste error. So I created apache#6338; Then I looked into the details how these env-vars[ were used](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L36), found out its ok to use `PULSAR_MEM` as an alternative. But it introduce problems: 1. Since `BOOKIE_GC` was not defined , the default [BOOKIE_EXTRA_OPTS](https://github.com/apache/pulsar/blob/28875d5abc4cd13a3e9cc4f59524d2566d9f9f05/conf/bkenv.sh#L39) will finally use default value of `BOOKIE_GC`, thus would cover same the JVM parameters defined prior in `PULSAR_MEM`. 2. May cause problems when bootstrap scripts changed in later dev, better to make it explicitly. So I create this pr to solve above problems(hidden trouble). ### Modifications As mentioned above, I've made such modifications below: 1. make `BOOKIE_MEM` and `BOOKIE_GC` explicit in `values-mini.yaml` file. Keep up with the format in`values.yaml` file. 2. remove all print-gc-logs related args. Considering the resource constraints of minikube environment. The removed part's content is `-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -verbosegc -XX:G1LogLevel=finest` 3. leave `PULSAR_PREFIX_dbStorage_rocksDB_blockCacheSize` empty as usual, as [conf/standalone.conf#L576](https://github.com/apache/pulsar/blob/df152109415f2b10dd83e8afe50d9db7ab7cbad5/conf/standalone.conf#L576) says it would to use 10% of the direct memory size by default. (cherry picked from commit 7d4df99)

The key shared policy does not support setting the maximum key hash range, so fix the java doc. (cherry picked from commit 77971e4)

…essage in batch. (apache#6345) Fixes apache#6344 Fixes apache#6350 The bug was brought in apache#5622 by changing the skip logic wrongly. (cherry picked from commit 63ccd43)

### Motivation Fixes apache#6343 ### Modifications Add a method to cast object value to `String`. (cherry picked from commit e1f7505)

…e#6356) ### Motivation Master Issue: apache#5454 When one Consumer subscribe multi topic, setSchemaInfoPorvider() will be covered by the consumer generated by the last topic. ### Modification clone schema for each consumer generated by topic. ### Verifying this change Add the schemaTest for it. (cherry picked from commit 8003d08)

…6361) (cherry picked from commit 943c903)

Fixes apache#6333 Previously, `hasMoreMessages` is test against: ``` return lastMessageIdInBroker.compareTo(lastDequeuedMessage) == 0 && incomingMessages.size() > 0; ``` However, the `incomingMessages` could be 0 when the consumer/reader has just started and hasn't received any messages yet. In this PR, the last entry is retrieved and decoded to get message metadata. for the batchIndex field population. (cherry picked from commit baf155f)

…ache#6364) ### Motivation Creating a topic does not wait for creating cursor of replicators ## Verifying this change The exists unit test can cover this change (cherry picked from commit 336e971)

@sijie

…pache#6373) This applies the recommended fix from apache#6355 (comment) Fixes apache#6355 ### Motivation This PR corrects the configmap data which was causing the autorecovery pod to crashloop with `could not find or load main class` ### Modifications Updated the configmap var data per [this comment](apache#6355 (comment)) from @sijie (cherry picked from commit af4773b)

…e#6375) Fixes apache#6260 Snappy, like other compressions (LZ4, ZSTD), depends on native libraries to do the real encode/decode stuff. When we shade them in a fat jar, only the java implementations of snappy class are shaded, however, left the JNI incompatible with the underlying c++ code. We should just remove the shade for snappy, and let maven import its lib as a dependency. I've tested the shaded jar locally generated by this pr, it works for all compression codecs. (cherry picked from commit 3197dcd)

**Motivation** Fix when sending a message, set duplicate key to properties, can't pull the message while concumer apache#6388 ```javascript //org.apache.pulsar.client.impl.MessageImpl if (msgMetadata.getPropertiesCount() > 0) { this.properties = Collections.unmodifiableMap(msgMetadataBuilder.getPropertiesList().stream() .collect(Collectors.toMap(KeyValue::getKey, KeyValue::getValue))); } else { properties = Collections.emptyMap(); } this.schema = schema; ``` Collectors.toMap can not allowed duplicate key **Changes** Replace old value with new value ```javascript if (msgMetadata.getPropertiesCount() > 0) { this.properties = Collections.unmodifiableMap(msgMetadataBuilder.getPropertiesList().stream() .collect(Collectors.toMap(KeyValue::getKey, KeyValue::getValue, (oldValue,newValue) -> newValue))); } else { properties = Collections.emptyMap(); } this.schema = schema; ``` (cherry picked from commit 79abc88)

…++ (apache#6391) ### Motivation Fix apache#6168 . >On C++ lib, like the following log, unacked messages are redelivered after about 2 * unAckedMessagesTimeout. ### Modifications As same apache#3118, by using TimePartition, fixed ` UnackedMessageTracker` . - Add `TickDurationInMs` - Add `redeliverUnacknowledgedMessages` which require `MessageIds` to `ConsumerImpl`, `MultiTopicsConsumerImpl` and `PartitionedConsumerImpl`. (cherry picked from commit 333888a)

…l back duration. (apache#6392) Currently, when constructing a reader, users can set both start message id and start time. This is strange and the behavior should be forbidden. (cherry picked from commit f862961)

The current logic for `resetCursor` by timestamp is odd. The first message it returns is the last message earlier or equal to the designated timestamp. This "earlier" message should be avoided to emit. (cherry picked from commit 81f8afd)

Four kinds of errors are fixed in this PR: - Array index out of bounds - Inconsistent equals and hashCode - Missing format argument - Reference equality test of boxed types According to https://lgtm.com/projects/g/apache/pulsar/alerts/?mode=tree&severity=error&id=&lang=java (cherry picked from commit 7fb9aff)

…#6399) Fixes apache#6228 (cherry picked from commit e6a631d)

Fixes apache#6400 ### Motivation This problem is blocking the current test. 1.1.8 version of `enum34` seems to have some problems, and the problem reproduces: Use pulsar latest code: ``` cd pulsar mvn clean install -DskipTests dokcer pull apachepulsar/pulsar-build:ubuntu-16.04 docker run -it -v $PWD:/pulsar --name pulsar apachepulsar/pulsar-build:ubuntu-16.04 /bin/bash docker exec -it pulsar /bin/bash cmake . make -j4 && make install cd python python setup.py bdist_wheel pip install dist/pulsar_client-*-linux_x86_64.whl ``` `pip show enum34` ``` Name: enum34 Version: 1.1.8 Summary: Python 3.4 Enum backported to 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4 Home-page: https://bitbucket.org/stoneleaf/enum34 Author: Ethan Furman Author-email: [email protected] License: BSD License Location: /usr/local/lib/python2.7/dist-packages Requires: Required-by: pulsar-client, grpcio ``` ``` root@55e06c5c770f:/pulsar/pulsar-client-cpp/python# python Python 2.7.12 (default, Oct 8 2019, 14:14:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from enum import Enum, EnumMeta Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named enum >>> exit() ``` There is no problem with using 1.1.9 in the test. ### Modifications * Upgrade enum34 from 1.1.8 to 1.1.9 ### Verifying this change local test pass (cherry picked from commit 2f42077)

Fix when send a delayed message ,there is a case when a consumer restarts and pull duplicate messages. apache#6403 (cherry picked from commit e71b9fc)

### Motivation Add verification for SchemaDefinitionBuilderImpl.java ### Verifying this change Added a new unit test. (cherry picked from commit 848ad30)

…elated to shading in pulsar-client. (apache#6406) Motivation Avro schemas are quite important for proper data flow and it is a pity that the apache#3762 issue stayed untouched for so long. There were some workarounds on how to make Pulsar use an original avro schema, but in the end, it is pretty hard to run an enterprise solution on workarounds. With this PR I would like to find a solution to the problem caused by shading avro in pulsar-client. As it was discussed in the issue, there are two possible solutions for this problem: Unshade the avro library in the pulsar-client library. (IMHO it seems like a proper solution for this problem, but it also brings a risk of unknown side-effects) Use reflection to get original schemas from generated classes. (I went for this solution) Could you please comment if this is a proper solution for the problem? I will add tests when my approach will be confirmed. Modifications First, we try to extract an original avro schema from the "$SCHEMA" field using reflection. If it doesn't work, the process falls back generation of the schema from POJO. (cherry picked from commit dab14ac)

…cheduled twice instead of pendingBatchRecei… (apache#6407) * fix topic discovery task scheduled twice instead of pendingBatchReceiveTask * remove wildcard imports Co-authored-by: avim <[email protected]> (cherry picked from commit 40995a0)

BatchReceivePolicy implements Serializable. (cherry picked from commit 792ab17)

netty 4.1.43 has a bug preventing it from using Linux native Epoll transport This results in pulsar brokers failing over to NioEventLoopGroup even when running on Linux. The bug is fixed in netty releases 4.1.45.Final (cherry picked from commit 760bd1a)

Motivation If set up maxMessagePublishBufferSizeInMB > Integer.MAX_VALUE / 1024 / 1024, the publish buffer limit does not take effect. The reason is maxMessagePublishBufferBytes always 0 when use following calculation method : pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024 * 1024; So, changed to pulsar.getConfiguration().getMaxMessagePublishBufferSizeInMB() * 1024L * 1024L; (cherry picked from commit 75a321d)

…open instance (apache#6436) (cherry picked from commit 2ed2eb8)

Motivation fix the bug of authenticationData is't initialized. the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect can't init the value of authenticationData. cause of the bug that you will get the null value form the method org.apache.pulsar.broker.authorization.AuthorizationProvider#canConsumeAsync when implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface. Modifications init the value of authenticationData from the method org.apache.pulsar.proxy.server.ProxyConnection#handleConnect. Verifying this change implements org.apache.pulsar.broker.authorization.AuthorizationProvider interface， and get the value of authenticationData. (cherry picked from commit b8f0ca0)

* Fixed the max backoff configuration for lookups * Fixed test expectation * More test fixes (cherry picked from commit 6ff87ee)

) Fixes apache#6453 ### Motivation `ConsumerBase` and `ProducerImpl` use `System.currentTimeMillis()` to measure the elapsed time in the 'operations' inner classes (`ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg`). An instance variable `createdAt` is initialized with `System.currentTimeMills()`, but it is not used for reading wall clock time, the variable is only used for computing elapsed time (e.g. timeout for a batch). When the variable is used to compute elapsed time, it would more sense to use `System.nanoTime()`. ### Modifications The instance variable `createdAt` in `ConsumerBase$OpBatchReceive` and `ProducerImpl$OpSendMsg` is initialized with `System.nanoTime()`. Usage of the variable is updated to reflect that the variable holds nano time; computations of elapsed time takes the difference between the current system nano time and the `createdAt` variable. The `createdAt` field is package protected, and is currently only used in the declaring class and outer class, limiting the chances for unwanted side effects. (cherry picked from commit 459ec6e)

…one (apache#6457) When starting Pulsar in standalone mode with TLS enabled, it will fail to create two namespaces during start. This is because it's using the unencrypted URL/port while constructing the PulsarAdmin client. (cherry picked from commit 3e1b8f6)

…rpm (apache#6458) Fix apache#6439 We shouldn't static link libssl in libpulsar.a, as this is a security red flag. we should just use whatever the libssl the system provides. Because if there is a security problem in libssl, all the machines can just update their own libssl library without rebuilding libpulsar.a. As suggested, this change not change the old behavior, and mainly provides 2 other additional pulsar cpp client library in deb/rpm, and add related docs of how to use 4 libs in doc. The additional 2 libs: - pulsarSharedNossl (libpulsarnossl.so), similar to pulsarShared(libpulsar.so), with no ssl statically linked. - pulsarStaticWithDeps(libpulsarwithdeps.a), similar to pulsarStatic(libpulsar.a), and archived in the dependencies libraries of `libboost_regex`, `libboost_system`, `libcurl`, `libprotobuf`, `libzstd` and `libz` statically. Passed 4 libs rpm/deb build, install, and compile with a pulsar-client example code. * also add libpulsarwithdeps.a together with libpulsar.a into cpp client release * add documentation for libpulsarwithdeps.a, add g++ build examples * add pulsarSharedNossl target to build libpulsarnossl.so * update doc * verify 4 libs in rpm/deb build, installed, use all good (cherry picked from commit 33eea88)

…#6460) ### Motivation fix correct name for proxy thread executor name (cherry picked from commit 5c2c058)

### Motivation Proxy-logging fetches incorrect producerId for `Send` command because of that logging always gets producerId as 0 and it fetches invalid topic name for the logging. ### Modification Fixed topic logging by fetching correct producerId for `Send` command. (cherry picked from commit 65cc303)

…me. (apache#6478) Fixes apache#6468 Fix create a partitioned topic with a substring of an existing topic name. And make create partitioned topic async. (cherry picked from commit 19ccfd5)

…empty (apache#6480) (cherry picked from commit 6604f54)

(cherry picked from commit 47ca8e6)

Fixes apache#6482 ### Motivation Prevent topic compaction from leaking direct memory ### Modifications Several leaks were discovered using Netty leak detection and code review. * `CompactedTopicImpl.readOneMessageId` would get an `Enumeration` of `LedgerEntry`, but did not release the underlying buffers. Fix: iterate though the `Enumeration` and release underlying buffer. Instead of logging the case where the `Enumeration` did not contain any elements, complete the future exceptionally with the message (will be logged by Caffeine). * Two main sources of leak in `TwoPhaseCompactor`. The `RawBacthConverter.rebatchMessage` method failed to close/release a `ByteBuf` (uncompressedPayload). Also, the return ByteBuf of `RawBacthConverter.rebatchMessage` was not closed. The first one was easy to fix (release buffer), to fix the second one and make the code easier to read, I decided to not let `RawBacthConverter.rebatchMessage` close the message read from the topic, instead the message read from the topic can be closed in a try/finally clause surrounding most of the method body handing a message from a topic (in phase two loop). Then if a new message was produced by `RawBacthConverter.rebatchMessage` we check that after we have added the message to the compact ledger and release the message. ### Verifying this change Modified `RawReaderTest.testBatchingRebatch` to show new contract. One can run the test described to reproduce the issue, to verify no leak is detected. (cherry picked from commit f2ec1b4)

### Motivation Currently, the proxy only works to proxy v1/v2 functions routes to the function worker. ### Modifications This changes this code to proxy all routes for the function worker when those routes match. At the moment this is still a static list of prefixes, but in the future it may be possible to have this list of prefixes be dynamically fetched from the REST routes. ### Verifying this change - added some tests to ensure the routing works as expected (cherry picked from commit 329e231)

(cherry picked from commit ad5415a)

See apache#6416. This change ensures that all futures within BrokerService have a guranteed timeout. As stated in apache#6416, we see cases where it appears that loading or creating a topic fails to resolve the future for unknown reasons. It appears that these futures *may* not be returning. This seems like a sane change to make to ensure that these futures finish, however, it still isn't understood under what conditions these futures may not be returning, so this fix is mostly a workaround for some underlying issues Co-authored-by: Addison Higham <[email protected]> (cherry picked from commit 4a4cce9)

…er When Ack Messages . (apache#6498) ### Motivation Because of apache#6391 , acked messages were counted as unacked messages. Although messages from brokers were acknowledged, the following log was output. ``` 2020-03-06 19:44:51.790 INFO ConsumerImpl:174 | [persistent://public/default/t1, sub1, 0] Created consumer on broker [127.0.0.1:58860 -> 127.0.0.1:6650] my-message-0: Fri Mar 6 19:45:05 2020 my-message-1: Fri Mar 6 19:45:05 2020 my-message-2: Fri Mar 6 19:45:05 2020 2020-03-06 19:45:15.818 INFO UnAckedMessageTrackerEnabled:53 | [persistent://public/default/t1, sub1, 0] : 3 Messages were not acked within 10000 time ``` This behavior happened on master branch. (cherry picked from commit 67f8cf3)

…er. (apache#6499) ### Motivation If the broker service is started, the client can connect to the broker and send requests depends on the namespace service, so we should create the namespace service before starting the broker. Otherwise, NPE occurs. ![image](https://user-images.githubusercontent.com/12592133/76090515-a9961400-5ff6-11ea-9077-cb8e79fa27c0.png) ![image](https://user-images.githubusercontent.com/12592133/76099838-b15db480-6006-11ea-8f39-31d820563c88.png) ### Modifications Move the namespace service creation and the schema registry service creation before start broker service. (cherry picked from commit 5285c68)

…access for topic (apache#6504) Co-authored-by: Sanjeev Kulkarni <[email protected]> (cherry picked from commit 36ea153)

Fix apache#6462 ### Motivation admin api add getLastMessageId return batchIndex (cherry picked from commit 757824f)

apache#6550) ### Motivation Disable channel auto-read when publishing rate or publish buffer exceeded. Currently, ServerCnx set channel auto-read to false when getting a new message and publish rate exceeded or publish buffer exceeded. So, it depends on reading more one message. If there are too many ServerCnx(too many topics or clients), this will result in publish rate limitations with a large deviation. Here is an example to show the problem. Enable publish rate limit in broker.conf ``` brokerPublisherThrottlingTickTimeMillis=1 brokerPublisherThrottlingMaxByteRate=10000000 ``` Use Pulsar perf to test 100 partition message publishing: ``` bin/pulsar-perf produce -s 500000 -r 100000 -t 1 100p ``` The test result: ``` 10:45:28.844 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 367.8 msg/s --- 1402.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 710.008 ms - med: 256.969 - 95pct: 2461.439 - 99pct: 3460.255 - 99.9pct: 4755.007 - 99.99pct: 4755.007 - Max: 4755.007 10:45:38.919 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 456.6 msg/s --- 1741.9 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 2551.341 ms - med: 2347.599 - 95pct: 6852.639 - 99pct: 9630.015 - 99.9pct: 10824.319 - 99.99pct: 10824.319 - Max: 10824.319 10:45:48.959 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 432.0 msg/s --- 1648.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 4373.505 ms - med: 3972.047 - 95pct: 11754.687 - 99pct: 15713.663 - 99.9pct: 17638.527 - 99.99pct: 17705.727 - Max: 17705.727 10:45:58.996 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 430.6 msg/s --- 1642.6 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 5993.563 ms - med: 4291.071 - 95pct: 18022.527 - 99pct: 21649.663 - 99.9pct: 24885.375 - 99.99pct: 25335.551 - Max: 25335.551 10:46:09.195 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 403.2 msg/s --- 1538.3 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 7883.304 ms - med: 6184.159 - 95pct: 23625.343 - 99pct: 29524.991 - 99.9pct: 30813.823 - 99.99pct: 31467.775 - Max: 31467.775 10:46:19.314 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 401.1 msg/s --- 1530.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 9587.407 ms - med: 6907.007 - 95pct: 28524.927 - 99pct: 34815.999 - 99.9pct: 36759.551 - 99.99pct: 37581.567 - Max: 37581.567 10:46:29.389 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 372.8 msg/s --- 1422.0 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 11984.595 ms - med: 10095.231 - 95pct: 34515.967 - 99pct: 40754.175 - 99.9pct: 43553.535 - 99.99pct: 43603.199 - Max: 43603.199 10:46:39.459 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 374.6 msg/s --- 1429.1 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 12208.459 ms - med: 7807.455 - 95pct: 38799.871 - 99pct: 46936.575 - 99.9pct: 50500.095 - 99.99pct: 50500.095 - Max: 50500.095 10:46:49.537 [main] INFO org.apache.pulsar.testclient.PerformanceProducer - Throughput produced: 295.6 msg/s --- 1127.5 Mbit/s --- failure 0.0 msg/s --- Latency: mean: 14503.565 ms - med: 10753.087 - 95pct: 45041.407 - 99pct: 54307.327 - 99.9pct: 57786.623 - 99.99pct: 57786.623 - Max: 57786.623 ``` Analyze the reasons for such a large deviation is the producer sent batch messages and ServerCnx read more one message. This PR can not completely solve the problem but can alleviate this problem. When the message publish rate exceeded, the broker set channel auto-read to false for all topics. This will avoid parts of ServerCnx read more one message. ### Does this pull request potentially affect one of the following parts: *If `yes` was chosen, please highlight the changes* - Dependencies (does it add or upgrade a dependency): (no) - The public API: (no) - The schema: (no) - The default values of configurations: (no) - The wire protocol: (no) - The rest endpoints: (no) - The admin cli options: (no) - Anything that affects deployment: (no) ### Documentation - Does this pull request introduce a new feature? (no) (cherry picked from commit ec31d54)

…over subscription mode. (apache#6558) Fixes apache#6552 ### Motivation apache#6552 is introduced by apache#5929, so this PR stop increase unacked messages for the consumer with Exclusive/Failover subscription mode. (cherry picked from commit 2449696)

* Fix: topic with one partition cannot be updated (cherry picked from commit 9602c9b)

### Motivation Fixes apache#6561 ### Modifications Initialize `BatchMessageAckerDisabled` with a `new BitSet()` Object. (cherry picked from commit 2007de6)

Commits on Jan 6, 2020

Release 2.5.0

sijie committed Jan 6, 2020

Configuration menu

View commit details

Copy full SHA for f2afad3

Browse repository at this point

Copy the full SHA

f2afad3 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch 2.5 All test #14

Branch 2.5 All test #14

Commits on Jan 6, 2020

Commits on Feb 17, 2020

Commits on Mar 21, 2020

Branch 2.5 All test #14

Are you sure you want to change the base?

Branch 2.5 All test #14

Commits on Jan 6, 2020

Commits on Feb 17, 2020

Commits on Mar 21, 2020