-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Auto update topic partitions #6732
Conversation
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
/pulsarbot run cpp-tests |
1 similar comment
/pulsarbot run cpp-tests |
/pulsarbot run unit-tests |
/pulsarbot run cpp-tests |
1 similar comment
/pulsarbot run cpp-tests |
assert(unsubscribedSoFar_ <= numPartitions_); | ||
assert(consumerIndex <= numPartitions_); | ||
Lock consumersLock(consumersMutex_); | ||
const auto numPartitions = numPartitions_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this change be avoid? Seems brings in lot of un-needed changes, also a little confusing to made numPartitions_
and numPartitions
mix-used together in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change soon, here I should've called method like getNumPartitionsWithLock()
(like I did in PartitionedConsumerImpl
.
Because producers_
/consumers_
may be modified in timer's callback, the code that accessed producers_
/consumers_
and some other members (see comments in headers) should be protected by lock()/unlock(), except in start()
.
/pulsarbot run-failure-checks |
@BewareMyPower thanks for the great work. Seems it is not normal for the cpp tests failure. |
Yeah. The test also passed in github action, you can see CI of commit 89ba471. cpp-tests > run-tests (line 136)
The strange thing is that the 163 tests cannot be completed, I've seen at most 141 tests completed, and the number may also be 121. The current test may be 98:
I guessed after 90 minutes (2 hours reached), the completes tests number will still be 90. |
Now I've found where the problem is. The tests stuck at the loop in while (true) {
// maximum wait time
ASSERT_LE(timeWaited, unAckedMessagesTimeoutMs * 3);
if (messagesReceived >= 10 * 2) { // **Problem**: Never reached here
break;
}
std::this_thread::sleep_for(std::chrono::milliseconds(500));
timeWaited += 500;
} It seems that |
/pulsarbot run-failure-checks |
2 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
9de457a
to
89ba471
Compare
/pulsarbot run-failure-checks |
1 similar comment
/pulsarbot run-failure-checks |
/pulsarbot run process |
/pulsarbot run-failure-checks |
/pulsarbot run process |
Hi, @codelipenghui @merlimat @jiazhai @sijie , could you help me with the failure of unit tests? I've only changed files under
They all failed with Failure message of
The other tests:
I tried to rollback to the older commit that has passed these tests before, or run failure checks again, but the error still happens. How should I deal with it? Should I close this PR and open a new PR? |
/pulsarbot run-failure-checks |
/pulsarbot run unit-test-flaky |
@BewareMyPower those are flaky tests due to the environmental issue. Typically it was because the disk space is not enough in Github, so it is not able to start the container-based integration. You can use |
Thanks for your reply. I've rerun those failures for many times but only seen the same |
/pulsarbot run-failure-checks |
@BewareMyPower this usually indicates a problem in the Github Action. We might need to see how to clean the disk space up. /cc @tuteng @codelipenghui |
/pulsarbot run-failure-checks |
conflict with #6938 will mark this as 2.5.2. |
### Motivation We need to increase producers or consumers when partitions updated. Java client has implemented this feature, see [#3513](#3513). This PR trys to implement the same feature in C++ client. ### Modifications - Add a `boost::asio::deadline_timer` to `PartitionedConsumerImpl` and `PartitionedProducerImpl` to register lookup task to detect partitions changes periodly; - Add an `unsigned int` configuration parameter to indicate the period seconds of detecting partitions change (default: 60 seconds); - Unlock the `mutex_` in `PartitionedConsumerImpl::receive` after `state_` were checked. > Explain: When new consumers are created, `handleSinglePartitionConsumerCreated` will be called finally, which tried to lock the `mutex_`. It may happen that `receive` acquire the lock again and again so that `handleSinglePartitionConsumerCreated` are blocked in `lock.lock()` for a long time. * auto update topic partitions extend for consumer and producer in c++ client * add c++ unit test for partitions update * format code with clang-format-5.0 * stop partitions update timer after producer/consumer called closeAsync() * fix bugs when running gtest-parallel * fix bug: Producer::flush() may cause deadlock * use getters to read `numPartitions` with or without lock (cherry picked from commit 30934e1)
### Motivation We need to increase producers or consumers when partitions updated. Java client has implemented this feature, see [apache#3513](apache#3513). This PR trys to implement the same feature in C++ client. ### Modifications - Add a `boost::asio::deadline_timer` to `PartitionedConsumerImpl` and `PartitionedProducerImpl` to register lookup task to detect partitions changes periodly; - Add an `unsigned int` configuration parameter to indicate the period seconds of detecting partitions change (default: 60 seconds); - Unlock the `mutex_` in `PartitionedConsumerImpl::receive` after `state_` were checked. > Explain: When new consumers are created, `handleSinglePartitionConsumerCreated` will be called finally, which tried to lock the `mutex_`. It may happen that `receive` acquire the lock again and again so that `handleSinglePartitionConsumerCreated` are blocked in `lock.lock()` for a long time. * auto update topic partitions extend for consumer and producer in c++ client * add c++ unit test for partitions update * format code with clang-format-5.0 * stop partitions update timer after producer/consumer called closeAsync() * fix bugs when running gtest-parallel * fix bug: Producer::flush() may cause deadlock * use getters to read `numPartitions` with or without lock
### Motivation We need to increase producers or consumers when partitions updated. Java client has implemented this feature, see [apache#3513](apache#3513). This PR trys to implement the same feature in C++ client. ### Modifications - Add a `boost::asio::deadline_timer` to `PartitionedConsumerImpl` and `PartitionedProducerImpl` to register lookup task to detect partitions changes periodly; - Add an `unsigned int` configuration parameter to indicate the period seconds of detecting partitions change (default: 60 seconds); - Unlock the `mutex_` in `PartitionedConsumerImpl::receive` after `state_` were checked. > Explain: When new consumers are created, `handleSinglePartitionConsumerCreated` will be called finally, which tried to lock the `mutex_`. It may happen that `receive` acquire the lock again and again so that `handleSinglePartitionConsumerCreated` are blocked in `lock.lock()` for a long time. * auto update topic partitions extend for consumer and producer in c++ client * add c++ unit test for partitions update * format code with clang-format-5.0 * stop partitions update timer after producer/consumer called closeAsync() * fix bugs when running gtest-parallel * fix bug: Producer::flush() may cause deadlock * use getters to read `numPartitions` with or without lock (cherry picked from commit 30934e1)
### Motivation We need to increase producers or consumers when partitions updated. Java client has implemented this feature, see [apache#3513](apache#3513). This PR trys to implement the same feature in C++ client. ### Modifications - Add a `boost::asio::deadline_timer` to `PartitionedConsumerImpl` and `PartitionedProducerImpl` to register lookup task to detect partitions changes periodly; - Add an `unsigned int` configuration parameter to indicate the period seconds of detecting partitions change (default: 60 seconds); - Unlock the `mutex_` in `PartitionedConsumerImpl::receive` after `state_` were checked. > Explain: When new consumers are created, `handleSinglePartitionConsumerCreated` will be called finally, which tried to lock the `mutex_`. It may happen that `receive` acquire the lock again and again so that `handleSinglePartitionConsumerCreated` are blocked in `lock.lock()` for a long time. * auto update topic partitions extend for consumer and producer in c++ client * add c++ unit test for partitions update * format code with clang-format-5.0 * stop partitions update timer after producer/consumer called closeAsync() * fix bugs when running gtest-parallel * fix bug: Producer::flush() may cause deadlock * use getters to read `numPartitions` with or without lock
Motivation
We need to increase producers or consumers when partitions updated.
Java client has implemented this feature, see #3513. This PR trys to implement the same feature in C++ client.
Modifications
boost::asio::deadline_timer
toPartitionedConsumerImpl
andPartitionedProducerImpl
to register lookup task to detect partitions changes periodly;unsigned int
configuration parameter to indicate the period seconds of detecting partitions change (default: 60 seconds);mutex_
inPartitionedConsumerImpl::receive
afterstate_
were checked.Verifying this change
This change added tests and can be verified as follows:
Run
PartitionsUpdateTest
test suite after the cmake build.Documentation