-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][pip] PIP-359: Support custom message listener executor for specific subscription #22902
Conversation
Why can't this logic be written by the person writing the listener? I think it adds extra complexity to an already complicated client |
Yeah, the solution you mentioned can also solve this problem. But on the contrary, I think providing a unified builder configuration would be easier to use. |
IMO the surface area the client already exposes is way too big. It feels any person that wanted to scratch their own itch just added their feature without holding back. This needs to be maintained. |
You make a very valid point. We do indeed need a plugin library that allows contributors to customize Pulsar according to their own needs without introducing more complexity into Pulsar. Do you have the opportunity to promote this matter? If so, I think we can continue to push forward with this. And take this proposal as the first test case. Otherwise, it might be better to first promote this optimization, so that those who need it can start using it. I am very interested in the plugin library you mentioned, and perhaps later we can work together to promote its implementation in Pulsar. However, it would be unwise to block other optimizations from entering before that. |
Great idea! Are you interested in building and promoting a similar contrib repo for Pulsar? This PIP may be a good opportunity to do this, and we can build it together. @asafm @liangyepianzhou |
I'm currently working on my free time on pulsar as in my new company they don't even use pulsar. |
We also participate in this work in our spare time, and progress is indeed slower. However, if we can achieve something significant together, it would be a very cool thing. |
@asafm Since we all agree that the plugin library is still a long way from being truly realized, it's best that we do not hinder the entry of other optimizations before that time. It would be best to let it continue along its original process, allowing those who need it to use this feature first.We can migrate some functionalities after the plugin library is completed. |
With the help of @liangyepianzhou I was able to fully figure out all the context required to understand what this PIP is all about.
In the asynchronous way, the MessageListener instance is called by the consumer, hence it is doing that with a thread taken from its own internal ExecutorService (i.e. thread pool). I suggest a different solution, if I may: in the I suggest we expand it to: This will allow the user to provide a separate executor service (thread pool) to be used when executing So summarizing my request for changes:
|
Okay, good suggestions! I will update it later as suggested.Thank you all @asafm @liangyepianzhou |
I have made the changes to the PIP as suggested, PTAL @asafm @liangyepianzhou Thanks! |
I have explained why we cannot use simply |
I understand you introduce I had read the implementation PR, there are some points I think are strange:
public interface MessageListenerExecutor {
void execute(Message<?> message, Runnable runnable);
} The interface is for the purpose of select an executor by public interface MessageListenerExecutorSelector {
ExecutorService select(Message<?> message);
}
Default implementations is an example, users could follow the pattern to develop their own extensions, and it can be used out of the box. Others is OK |
Or public interface MessageListenerExecutorSelector {
ExecutorService select(Message<?> message);
ExecutorService selectOrdering(Message<?> message);
} If the subscriptionType is KeyShared, call selectOrdering, others call select; |
For the first point, I don't see any better point than now, on the contrary, the interface you mentioned has shortcomings: For the second point, refer to the previous discussion:#22861 (comment) |
Also see:#22902 (comment) |
Hi, Everyone! @lhotari @codelipenghui @asafm @liangyepianzhou @dao-jun @poorbarcode @Technoboy- I have sent a voting email for PIP-359: https://lists.apache.org/thread/oo3jdvq3b6bv6p4n7x7sdvypw4gp6hpk Thanks! |
@AuroraTwinkle Please select only one documentation label in your PR description. |
I have reviewed the comments made by @lhotari but I'm sorry I still don't understand.
This is basically to "spill the guts/internals" of the way the customer is implemented. The reason I'm asking is that I think it's too complicated. You guys move to the vote phase way too soon. I mean you posted your comments 13 hours ago. I don't want to hold anyone back of course, but you're changing quite a substantial public API here. @lhotari I think we need at least one more set of eyes on this solution before we can proceed. |
@asafm I believe that this concern could be handled by improving the explanation of the concept and abstraction that this exposes. I agree that it's not a good practice to add abstractions that are implementation specific. This abstraction isn't implementation specific, but the problem is that 95% of the users don't need it. In a multi-topic consumer when a MessageListener is used, there might be a requirement to have different priorities how the messages for a specific topic are processed. It is up to the implementer how to implement this logic.
That's a valid point. I think that we would need to come up with a way to add extensions to the Pulsar client without cluttering the basic use cases. 95% of Pulsar users will never need this
It's great to see that this PIP has interest. Even if someone is accepted with a vote doesn't mean that we cannot make changes later before releasing the feature. In this case, there's a clear need to solve the multi-topic consumer queuing problem when using a MessageListener. We have to find a way to address that problem. |
One possible idea for avoiding cluttering the builder interfaces with more and more configuration options would be to support some type of extension pattern. In this case, there could be this type of method on the ConsumerBuilder interface: <E extends ConsumerBuilderExtension> E extensionBuilder(Class<E> customerBuilderExtensionClass); then these interfaces: interface MessageListenerExecutorConsumerBuilderExtension<T> extends ConsumerBuilderExtension<T> {
MessageListenerExecutorConsumerBuilderExtension<T> messageListenerExecutor(MessageListenerExecutor messageListenerExecutor);
}
interface ConsumerBuilderExtention {
ConsumerBuilder<T> parent();
} then it would be possible to call Obviously this is more complexity, but it would address the problem of cluttering the main interfaces with configuration options that are only for very specific requirements. |
How do you think about this? @asafm @liangyepianzhou |
Hi @lhotari, this is a truly innovative concept that showcases a profound level of technical proficiency and expertise. However, as you've mentioned, it does seem overly complex. I suggest we revert to the initial design approach; I'm convinced that the first iteration of the API design was the most succinct and user-friendly. What are your thoughts? cc @asafm @AuroraTwinkle pulsarClient.newConsumer().topic("topic").subscriptionName("subName")
.messageListener((consumer, msg) -> {}, new ExecutorProvider(10, "your_poolName"))
.subscribe(); |
The pip is approved with 5 binding votes and 1 non-binding vote and stay more than 72 hours. So merge it. |
… specific subscription (#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. -->
… specific subscription (apache#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](apache#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. -->
… specific subscription (#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. --> (cherry picked from commit 10f4e02)
… specific subscription (apache#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](apache#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. -->
… specific subscription (apache#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](apache#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. --> (cherry picked from commit 10f4e02) (cherry picked from commit c5846bb)
… specific subscription (apache#22861) Co-authored-by: duanlinlin <[email protected]> [PIP-359](apache#22902) Support custom message listener thread pool for specific subscription, avoid individual subscription listener consuming too much time leading to higher consumption delay in other subscriptions. <!-- ### Contribution Checklist - PR title format should be *[type][component] summary*. For details, see *[Guideline - Pulsar PR Naming Convention](https://pulsar.apache.org/contribute/develop-semantic-title/)*. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. --> <!-- Either this PR fixes an issue, --> <!-- or this PR is one task of an issue --> <!-- If the PR belongs to a PIP, please add the PIP link here --> <!-- Details of when a PIP is required and how the PIP process work, please see: https://github.com/apache/pulsar/blob/master/pip/README.md --> ### Motivation In our scenario, there is a centralized message proxy service, this service will use the same PulsarClient instance to create a lot of subscription groups to consume many topics and cache messages locally.Then the business will pull messages from the cache of the proxy service. It seems that there is no problem, but during use, we found that when the message processing time of several consumer groups (listener mode) is very high, it almost affects all consumer groups responsible for the proxy service, causing a large number of message delays. By analyzing the source code, we found that by default, all consumer instances created from the same PulsarClient will share a thread pool to process message listeners, and sometimes there are multiple consumer message listeners bound to the same thread. Obviously, when a consumer processes messages and causes long-term blocking, it will cause the messages of other consumers bound to the thread to fail to be processed in time, resulting in message delays. Therefore, for this scenario, it may be necessary to support specific a message listener thread pool with consumer latitudes to avoid mutual influence between different consumers. <!-- Explain here the context, and why you're making that change. What is the problem you're trying to solve. --> ### Modifications Support custom message listener thread pool for specific subscription. <!-- Describe the modifications you've done. --> (cherry picked from commit 10f4e02) (cherry picked from commit c5846bb)
Motivation
PIP-359
Implementation PR:#22861
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: