fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719

hangc0276 · 2020-04-12T02:15:30Z

Motivation

when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request.

Current strategy just using the following calculating formula to deal with those situation:

messagesToRead = Math.min(availablePermits, readBatchSize);

availablePermits is the number of message the consumer requested, and readBatchSize is the max read entry size set in broker.conf

Assuming availablePermits is 1000 and readBatchSize is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is 500 * 1000 = 500,000 messages from bookkeeper or broker cache and push 500,000 messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase readBatchSize to increase bookkeeper read throughput.

Changes

I add an variable avgMessagesPerEntry to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula

avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value

avgMessagePerEntry is the history average message number per entry and new Value is the message number per entry in the fetch request the broker read from cache or bookkeeper. avgPercent is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000.

When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula:

messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize);

I also expose the avgMessagePerEntry static value to consumer stat metric json.

sijie · 2020-04-12T21:15:03Z

/pulsarbot run-failure-checks

sijie · 2020-04-12T21:16:02Z

@hangc0276 nice contribution!

jiazhai · 2020-04-13T05:24:17Z

/pulsarbot run-failure-checks

codelipenghui · 2020-04-13T08:33:52Z

@hangc0276 Can you rebase the branch? There are some integration test related fixes in the master.

hangc0276 · 2020-04-13T13:39:04Z

@hangc0276 Can you rebase the branch? There are some integration test related fixes in the master.

I have merged the master branch code, but also run test failed. Maybe some test case run failed, i will check in detail.

codelipenghui · 2020-04-13T14:09:55Z

/pulsarbot run-failure-checks

hangc0276 · 2020-04-18T12:26:22Z

/pulsarbot run-failure-checks

hangc0276 · 2020-04-18T13:13:17Z

/pulsarbot run-failure-checks

sijie · 2020-04-21T20:09:24Z

/pulsarbot run-failure-checks

codelipenghui · 2020-05-19T13:36:39Z

@hangc0276 If you have time, please help take a look at the failed tests and resolve the conflicts. And I think you can add a flag in the broker and disable it by default.

…etch request

hangc0276 · 2020-05-24T08:51:20Z

@hangc0276 If you have time, please help take a look at the failed tests and resolve the conflicts. And I think you can add a flag in the broker and disable it by default.

@sijie @jiazhai @codelipenghui I add a flag to disable the precise dispatcher flow control to resolve the test case conflicts, and add a test case, please take a look again. Thanks.

hangc0276 · 2020-05-24T12:45:52Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-24T13:53:00Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-25T01:36:10Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-25T07:28:44Z

/pulsarbot run-failure-checks

codelipenghui · 2020-05-25T09:59:30Z

@hangc0276 Looks good to me.

hangc0276 · 2020-05-25T15:37:37Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-26T01:13:09Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-26T15:19:00Z

/pulsarbot run-failure-checks

codelipenghui · 2020-05-27T05:29:24Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-27T06:46:01Z

/pulsarbot run-failure-checks

hangc0276 · 2020-05-27T12:24:52Z

/pulsarbot run-failure-checks

codelipenghui · 2020-05-29T09:34:46Z

Motivation

when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request.

Current strategy just using the following calculating formula to deal with those situation:

messagesToRead = Math.min(availablePermits, readBatchSize);

availablePermits is the number of message the consumer requested, and readBatchSize is the max read entry size set in broker.conf

Assuming availablePermits is 1000 and readBatchSize is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is 500 * 1000 = 500,000 messages from bookkeeper or broker cache and push 500,000 messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase readBatchSize to increase bookkeeper read throughput.

Changes

I add an variable avgMessagesPerEntry to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula

avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value

avgMessagePerEntry is the history average message number per entry and new Value is the message number per entry in the fetch request the broker read from cache or bookkeeper. avgPercent is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000.

When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula:

messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize);

I also expose the avgMessagePerEntry static value to consumer stat metric json.

…xpose avgMessagesPerEntry metric (apache#6719) ### Motivation when consumer send fetch request to broker server, it contains fetch message number telling the server how many messges should be pushed to consumer client. However, the broker server stores data in bookkeeper or broker cache according to entry not single message if producer produce message using batch feature. There is a gap to map the number of message to the number of entry when dealing with consumer fetch request. Current strategy just using the following calculating formula to deal with those situation: ``` messagesToRead = Math.min(availablePermits, readBatchSize); ``` `availablePermits` is the number of message the consumer requested, and `readBatchSize` is the max read entry size set in broker.conf Assuming `availablePermits` is 1000 and `readBatchSize` is 500, and each entry contains 1000 messages, messagesToRead will be 500 according to this formula. The broker server will read 500 entry, that is `500 * 1000 = 500,000` messages from bookkeeper or broker cache and push `500,000` messages to consumer at one time though the consumer just need 1000 messages, which leading the consumer cost too much memory to store the fetched message, especially when we increase `readBatchSize` to increase bookkeeper read throughput. ### Changes I add an variable `avgMessagesPerEntry` to record average messages stored in one entry. It will update when broker server push message to the consumer using the following calculating formula ``` avgMessagesPerEntry = avgMessagePerEntry * avgPercent + (1 - avgPercent) * new Value ``` `avgMessagePerEntry` is the history average message number per entry and `new Value` is the message number per entry in the fetch request the broker read from cache or bookkeeper. `avgPercent` is a final value 0.9, and the value just control the history avgMessagePerEntry decay rate when update new one. The avgMessagePerEntry initial value is 1000. When dealing with consumer fetch request, it will map fetch requst number to entry number according to the following formula: ``` messagesToRead = Math.min((int) Math.ceil(availablePermits * 1.0 / avgMessagesPerEntry), readBatchSize); ``` I also expose the avgMessagePerEntry static value to consumer stat metric json.

lordcheng10 · 2022-02-10T02:45:35Z

Why should the initial value of avgMessagePerEntry be fixed at 1000, instead of using the actual number of messages of an entry obtained by the first push as the initial value.
If the actual number of messages contained in an entry is 1, then according to the current logic, it takes more than 20 iterations before avgMessagePerEntry may be close to the real value, and before again, the number of messages pushed each time will be 1, which will seriously affect the throughput. @hangc0276

sijie approved these changes Apr 12, 2020

View reviewed changes

sijie assigned hangc0276 Apr 12, 2020

sijie added area/broker type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages labels Apr 12, 2020

sijie added this to the 2.6.0 milestone Apr 12, 2020

jiazhai approved these changes Apr 13, 2020

View reviewed changes

codelipenghui approved these changes Apr 13, 2020

View reviewed changes

hangc0276 force-pushed the flow_control branch from 4cc4318 to e8fc448 Compare April 13, 2020 11:41

hangc0276 force-pushed the flow_control branch from 9346ec9 to 0173c0d Compare April 18, 2020 08:32

hangc0276 force-pushed the flow_control branch from 0173c0d to d6c4318 Compare May 1, 2020 16:24

sijie mentioned this pull request May 20, 2020

[discussion] Pulsar release 2.6.0 #5819

Closed

hangc0276 added 5 commits May 24, 2020 16:22

add message number to entry number conversion when dealing consumer f…

40b87af

…etch request

merge master

ab6ddea

fix a bug

d305466

fix test case failed

fef0d92

add precise dispatcher flow control flag

9d8c016

hangc0276 force-pushed the flow_control branch from d6c4318 to 9d8c016 Compare May 24, 2020 08:41

merge master code

9c12bdc

codelipenghui merged commit 4883e1b into apache:master May 29, 2020

codelipenghui mentioned this pull request Jun 13, 2020

[pulsar-broker] Dispatch batch messages according consumer permits #7266

Merged

This was referenced Nov 23, 2022

[improve][broker] Make dispatch rate limiter more precise #18553

Open

[do not merge] [improve] [broker] Make the dispatch rate limit more precise when subscription is created #18581

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719

fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719

hangc0276 commented Apr 12, 2020

sijie commented Apr 12, 2020

sijie commented Apr 12, 2020

jiazhai commented Apr 13, 2020

codelipenghui commented Apr 13, 2020

hangc0276 commented Apr 13, 2020

codelipenghui commented Apr 13, 2020

hangc0276 commented Apr 18, 2020

hangc0276 commented Apr 18, 2020

sijie commented Apr 21, 2020

codelipenghui commented May 19, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 25, 2020

hangc0276 commented May 25, 2020

codelipenghui commented May 25, 2020

hangc0276 commented May 25, 2020

hangc0276 commented May 26, 2020

hangc0276 commented May 26, 2020

codelipenghui commented May 27, 2020

hangc0276 commented May 27, 2020

hangc0276 commented May 27, 2020

codelipenghui commented May 29, 2020

lordcheng10 commented Feb 10, 2022 •

edited

Loading

fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719

fix consumer fetch message number maps to read entry number bug and expose avgMessagesPerEntry metric #6719

Conversation

hangc0276 commented Apr 12, 2020

Motivation

Changes

sijie commented Apr 12, 2020

sijie commented Apr 12, 2020

jiazhai commented Apr 13, 2020

codelipenghui commented Apr 13, 2020

hangc0276 commented Apr 13, 2020

codelipenghui commented Apr 13, 2020

hangc0276 commented Apr 18, 2020

hangc0276 commented Apr 18, 2020

sijie commented Apr 21, 2020

codelipenghui commented May 19, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 24, 2020

hangc0276 commented May 25, 2020

hangc0276 commented May 25, 2020

codelipenghui commented May 25, 2020

hangc0276 commented May 25, 2020

hangc0276 commented May 26, 2020

hangc0276 commented May 26, 2020

codelipenghui commented May 27, 2020

hangc0276 commented May 27, 2020

hangc0276 commented May 27, 2020

codelipenghui commented May 29, 2020

Motivation

Changes

lordcheng10 commented Feb 10, 2022 • edited Loading

lordcheng10 commented Feb 10, 2022 •

edited

Loading