-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719
Conversation
/pulsarbot run-failure-checks |
@hangc0276 nice contribution! |
/pulsarbot run-failure-checks |
@hangc0276 Can you rebase the branch? There are some integration test related fixes in the master. |
I have merged the master branch code, but also run test failed. Maybe some test case run failed, i will check in detail. |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
2 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
@hangc0276 If you have time, please help take a look at the failed tests and resolve the conflicts. And I think you can add a flag in the broker and disable it by default. |
@sijie @jiazhai @codelipenghui I add a flag to disable the precise dispatcher flow control to resolve the test case conflicts, and add a test case, please take a look again. Thanks. |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
2 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
@hangc0276 Looks good to me. |
/pulsarbot run-failure-checks |
5 similar comments
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
/pulsarbot run-failure-checks |
Motivationwhen consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation:
Assuming ChangesI add an variable
When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula:
I also expose the avgMessagePerEntry static value to consumer stat metric json. |
…xpose avgMessagesPerEntry metric (apache#6719) ### Motivation when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation: ``` messagesToRead = Math.min(availablePermits, readBatchSize); ``` `availablePermits` is the number of message the consumer requested, and `readBatchSize` is the max read entry size set in broker.conf Assuming `availablePermits` is 1000 and `readBatchSize` is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is `500 * 1000 = 500,000` messages from bookkeeper or broker cache and push `500,000` messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase `readBatchSize` to increase bookkeeper read throughput. ### Changes I add an variable `avgMessagesPerEntry` to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula ``` avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value ``` `avgMessagePerEntry` is the history average message number per entry and `new Value` is the message number per entry in the fetch request the broker read from cache or bookkeeper. `avgPercent` is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000. When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula: ``` messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize); ``` I also expose the avgMessagePerEntry static value to consumer stat metric json.
…xpose avgMessagesPerEntry metric (apache#6719) ### Motivation when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation: ``` messagesToRead = Math.min(availablePermits, readBatchSize); ``` `availablePermits` is the number of message the consumer requested, and `readBatchSize` is the max read entry size set in broker.conf Assuming `availablePermits` is 1000 and `readBatchSize` is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is `500 * 1000 = 500,000` messages from bookkeeper or broker cache and push `500,000` messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase `readBatchSize` to increase bookkeeper read throughput. ### Changes I add an variable `avgMessagesPerEntry` to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula ``` avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value ``` `avgMessagePerEntry` is the history average message number per entry and `new Value` is the message number per entry in the fetch request the broker read from cache or bookkeeper. `avgPercent` is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000. When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula: ``` messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize); ``` I also expose the avgMessagePerEntry static value to consumer stat metric json.
…xpose avgMessagesPerEntry metric (apache#6719) ### Motivation when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation: ``` messagesToRead = Math.min(availablePermits, readBatchSize); ``` `availablePermits` is the number of message the consumer requested, and `readBatchSize` is the max read entry size set in broker.conf Assuming `availablePermits` is 1000 and `readBatchSize` is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is `500 * 1000 = 500,000` messages from bookkeeper or broker cache and push `500,000` messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase `readBatchSize` to increase bookkeeper read throughput. ### Changes I add an variable `avgMessagesPerEntry` to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula ``` avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value ``` `avgMessagePerEntry` is the history average message number per entry and `new Value` is the message number per entry in the fetch request the broker read from cache or bookkeeper. `avgPercent` is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000. When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula: ``` messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize); ``` I also expose the avgMessagePerEntry static value to consumer stat metric json.
…xpose avgMessagesPerEntry metric (apache#6719) ### Motivation when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation: ``` messagesToRead = Math.min(availablePermits, readBatchSize); ``` `availablePermits` is the number of message the consumer requested, and `readBatchSize` is the max read entry size set in broker.conf Assuming `availablePermits` is 1000 and `readBatchSize` is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is `500 * 1000 = 500,000` messages from bookkeeper or broker cache and push `500,000` messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase `readBatchSize` to increase bookkeeper read throughput. ### Changes I add an variable `avgMessagesPerEntry` to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula ``` avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value ``` `avgMessagePerEntry` is the history average message number per entry and `new Value` is the message number per entry in the fetch request the broker read from cache or bookkeeper. `avgPercent` is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000. When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula: ``` messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize); ``` I also expose the avgMessagePerEntry static value to consumer stat metric json.
Why should the initial value of avgMessagePerEntry be fixed at 1000, instead of using the actual number of messages of an entry obtained by the first push as the initial value. |
Motivation
when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request.
Current strategy just using the following calculating formula to deal with those situation:
availablePermits
is the number of message the consumer requested, andreadBatchSize
is the max read entry size set in broker.confAssuming
availablePermits
is 1000 andreadBatchSize
is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is500 * 1000 = 500,000
messages from bookkeeper or broker cache and push500,000
messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increasereadBatchSize
to increase bookkeeper read throughput.Changes
I add an variable
avgMessagesPerEntry
to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formulaavgMessagePerEntry
is the history average message number per entry andnew Value
is the message number per entry in the fetch request the broker read from cache or bookkeeper.avgPercent
is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000.When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula:
I also expose the avgMessagePerEntry static value to consumer stat metric json.