Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #6624]Support mark() & reset() for TieredFileSegmentInputStream #6625

Conversation

TheR1sing3un
Copy link
Member

Make sure set the target branch to develop

What is the purpose of the change

fix #6624

Brief changelog

XX

Verifying this change

XXXX

Follow this checklist to help us incorporate your contribution quickly and easily. Notice, it would be helpful if you could finish the following 5 checklist(the last one is not necessary)before request the community to review your PR.

  • Make sure there is a Github issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a Github issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [ISSUE #123] Fix UnknownException when host config not exist. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test(over 80% coverage) to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add integration-test in test module.
  • Run mvn -B clean apache-rat:check findbugs:findbugs checkstyle:checkstyle to make sure basic checks pass. Run mvn clean install -DskipITs to make sure unit-test pass. Run mvn clean test-compile failsafe:integration-test to make sure integration-test pass.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

1. add UT to verify TieredFileSegmentInputStream
1. refactor TieredFileSegmentInputStream
1. support mark&reset TieredFileSegmentInputStream
…StreamTest

1. remove commended code in TieredFileSegmentInputStreamTest
@TheR1sing3un TheR1sing3un marked this pull request as ready for review April 20, 2023 17:07
@TheR1sing3un TheR1sing3un changed the title [ISSUE #6624]Support reset method of tiered file segment input stream [ISSUE #6624]Support mark() & reset() for TieredFileSegmentInputStream Apr 20, 2023
…tter understandability

1. refactor TieredFileSegmentInputStream for better understandability
…Stream

1. refactor some code in TieredFileSegmentInputStream
@codecov-commenter
Copy link

Codecov Report

Merging #6625 (3bd0826) into develop (ea8b9d9) will increase coverage by 0.00%.
The diff coverage is 80.15%.

@@            Coverage Diff             @@
##             develop    #6625   +/-   ##
==========================================
  Coverage      43.11%   43.12%           
- Complexity      8997     9001    +4     
==========================================
  Files           1107     1108    +1     
  Lines          78287    78367   +80     
  Branches       10202    10212   +10     
==========================================
+ Hits           33755    33792   +37     
- Misses         40314    40339   +25     
- Partials        4218     4236   +18     
Impacted Files Coverage Δ
...q/tieredstore/provider/posix/PosixFileSegment.java 60.17% <ø> (ø)
...edstore/provider/TieredFileSegmentInputStream.java 80.00% <80.00%> (ø)
...cketmq/tieredstore/provider/TieredFileSegment.java 69.48% <100.00%> (-4.40%) ⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

1. refactor TieredFileSegmentInputStream
2. add a
TieredFileSegmentInputStream.Factory to build instance
…d directory structure

1. refactor TieredFileSegmentInputStream related directory structure
…itLogInputStream

1. delete `commitLogOffsetBuffer` in TieredCommitLogInputStream
1. benchmark TieredFileSegmentInputStream pef

Closes apache#6624
…eSegmentInputStream

1. optimized `read(byte[], int, int)` for TieredFIleSegmentInputStream

Closes apache#6624
@TheR1sing3un
Copy link
Member Author

Sorry for busy these time~ Now I finish a optimized batch read method in TieredFileSegmentInputStream and its child class: TiredCommitLogInputStream. I also do a JMH benchmark between original batch read method in InputStream and new optimized method in TieredCommitLogInputStream.
In a common scenario:

  • msg size : 4KB
  • tieredStoreGroupCommitSize : 32MB
  • batch read size : 8KB(In S3 client, it will read 8192 Bytes each time calling read(byte[], int, int))

ops(read 32MB)
image
average time(read 32MB)
image

I think this optimization can improve performance by an order of magnitude

for more different combinations of msg size and batch read size

image

@zhouxinyu @ShadowySpirits

1. fix a dead cycle in TieredFileSegmentInputStream.java
2. remove unused JMH
related dependency

Closes apache#6624
@zhouxinyu zhouxinyu merged commit 1a9a8b1 into apache:develop May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support mark() & reset() for TieredFileSegmentInputStream
4 participants