Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PIP-146] ManagedCursorInfo compression #14542

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

nodece
Copy link
Member

@nodece nodece commented Mar 3, 2022

Signed-off-by: Zixuan Liu [email protected]

Fixes #14529

Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

Modifications

  • Add a named ManagedCursorInfoMetadata message to MLDataFormats.proto for as compression metadata
  • Add the managedCursorInfoCompressionType to org.apache.pulsar.broker.ServiceConfiguration and org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig
  • This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

ManagedLedgerInfo compress

Documentation

  • doc-required

Add managedCursorInfoCompressionType to broker configuration

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Mar 3, 2022
@nodece
Copy link
Member Author

nodece commented Mar 4, 2022

/pulsarbot rerun-failure-checks

@nodece nodece force-pushed the managed_cursor_compress branch from 30d2021 to a8c12a2 Compare March 7, 2022 02:36
@nodece nodece requested a review from eolivelli March 7, 2022 02:37
@nodece nodece force-pushed the managed_cursor_compress branch from fa1013f to 9591d71 Compare March 7, 2022 03:28
@Anonymitaet
Copy link
Member

@momo-jun a soft reminder: this PR is labeled w/ doc-required and targeted for 2.11

@momo-jun
Copy link
Contributor

@momo-jun a soft reminder: this PR is labeled w/ doc-required and targeted for 2.11

Confirmed with @nodece, I will add the managedCursorInfoCompressionType parameter to the parameter list of broker/standalone configurations after 2.10 release.

Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui added this to the 2.11.0 milestone Mar 15, 2022
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

As @Jason918 mentioned in the proposal, to introduce a compression threshold to avoid resources waste for small size cursor data, and it should be also applied to the topic metadata compression, so I think it's better to use a separate PR to add this improvement for both managed ledger metadata and cursor metadata.

@codelipenghui codelipenghui requested a review from Jason918 March 15, 2022 05:37
@codelipenghui
Copy link
Contributor

@eolivelli Please help review this PR again.

@nodece
Copy link
Member Author

nodece commented Mar 16, 2022

Great work!

As @Jason918 mentioned in the proposal, to introduce a compression threshold to avoid resources waste for small size cursor data, and it should be also applied to the topic metadata compression, so I think it's better to use a separate PR to add this improvement for both managed ledger metadata and cursor metadata.

Thanks @codelipenghui, when this PR is merged, I will open a new PR to do this.

@nodece
Copy link
Member Author

nodece commented Mar 16, 2022

@eolivelli Could you review this PR?

@codelipenghui
Copy link
Contributor

@eolivelli Please help review this PR again.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall it looks good, apart from the unsafe use of ByteBuf.array()

@nodece nodece force-pushed the managed_cursor_compress branch from 9591d71 to 4cac93e Compare April 8, 2022 10:39
@nodece
Copy link
Member Author

nodece commented Apr 8, 2022

/pulsarbot rerun-failure-checks

@nodece
Copy link
Member Author

nodece commented Apr 10, 2022

@eolivell your request has been fixed, could you review this again?

@nodece nodece force-pushed the managed_cursor_compress branch from 4cac93e to a609822 Compare April 11, 2022 06:36
@nodece
Copy link
Member Author

nodece commented Apr 11, 2022

/pulsarbot rerun-failure-checks

@nodece nodece force-pushed the managed_cursor_compress branch from a609822 to 3b3bc5e Compare April 11, 2022 10:12
@nodece
Copy link
Member Author

nodece commented Apr 11, 2022

@codelipenghui Thanks, could you review again?

@nodece
Copy link
Member Author

nodece commented Apr 18, 2022

@eolivelli Could review this PR again?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@nodece
Copy link
Member Author

nodece commented Apr 18, 2022

@eolivelli Please help review this PR again.

Approved by Enrioc.

Copy link
Contributor

@Jason918 Jason918 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit 4398733 into apache:master Apr 19, 2022
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
Fixes apache#14529 

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.


### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication
@Anonymitaet Anonymitaet added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Apr 25, 2022
codelipenghui pushed a commit to codelipenghui/incubator-pulsar that referenced this pull request Jun 2, 2022
Fixes apache#14529

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
codelipenghui pushed a commit that referenced this pull request Jun 2, 2022
Fixes #14529

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jun 6, 2022
Fixes apache#14529

### Motivation

The cursor data is managed by ZooKeeper/etcd metadata store. When cursor data becomes more and more, the data size will increase and will take a lot of time to pull the data. Therefore, it is necessary to add compression for the cursor, which can reduce the size of data and reduce the time of pulling data.

### Modifications

- Add a named `ManagedCursorInfoMetadata` message to `MLDataFormats.proto` for as compression metadata
- Add the `managedCursorInfoCompressionType` to `org.apache.pulsar.broker.ServiceConfiguration` and `org.apache.bookkeeper.mledger.ManagedLedgerFactoryConfig`
- This feature is the same as the implementation of ManagedLedgerInfo compression, so the code is optimized to avoid duplication

(cherry picked from commit 4398733)
(cherry picked from commit 70c7794)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-complete Your PR changes impact docs and the related docs have been already added. release/2.9.3 release/2.10.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PIP-146: ManagedCursorInfo compression
8 participants