Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core-clp: Refactor LibarchiveFileReader to properly handle empty data blocks (fixes #389). #455

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

LinZhihao-723
Copy link
Member

Description

The current LibarchiveFileReader implementation has the following problems:

  • It doesn't handle the case where the data block is empty, which is the root cause of clp: Infinite loop when compressing zip archive. #389
  • Not all data members are initialized in the default constructor
  • Buffer peeking method's signature can be improved with C++20 enabled

This PR makes the following changes accordingly:

  • Handle the empty data block properly. All empty blocks will be skipped and only nonempty blocks can be read
  • Initialize all data members in the member declaration (which matches our latest coding guideline)
  • Using std::span for peeking buffer content

Validation performed

@haiqi96
Copy link
Contributor

haiqi96 commented Jun 25, 2024

A high level comment:
Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.

That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.

One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

void LibarchiveFileReader::peek_buffered_data(char const*& buf, size_t& buf_size) const {
auto LibarchiveFileReader::peek_buffered_data() const -> std::span<char const> {
char const* buf{nullptr};
size_t buf_size{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why empty init instead of 0. does it mean we don't care about its value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. buf_size is set in the nested if-else below according to different cases. This is just a forward declaration
  2. Empty init for size_t is 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote in a more clear way

@LinZhihao-723
Copy link
Member Author

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.

That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.

One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

@haiqi96
Copy link
Contributor

haiqi96 commented Jun 26, 2024

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.
That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.
One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

A high level comment: Methodlogy wise, I feel the rest of our code tend to let caller check if the data_block is empty, rather having a function to read the next 'none-empty' data block.
That said, I don't see a nit way of doing it unless we let try_load_data_block return a new type of error code and let the caller check it.
One option I might consider is to keep the private method read_next_data_block the unchanged, but add the while loop in try_load_data_block(). This is really a nit though

The whole reason why we need this fix is because we don't have any high level code handling empty blocks. And I don't think the high level code should handle empty blocks in this case since the empty blocks should be hidden from the caller. This class is to provide abstraction to read data using libarchive, so the upper level caller shouldn't really care how data blocks are maintained (like there could be an empty data block) in different archives

Perhaps I didn't make it clear, the caller here I was suggesting was the caller of read_next_data_block, so it's still within libarchiveReader.

That said, what I was really suggesting was to hide the "non_empty" details from the caller of read_next_data_block..
Instead, the read_next_data_block can handle the empty_block with a while loop. Unless we see a future need to read an empty block, otherwise I don't think we need to expose the details all the way up to try_load_nonempty_data_block, which is called by FileCompessor.
Anyway, it's my personal preference and I will let you decide.

@LinZhihao-723 LinZhihao-723 changed the title core-clp: Refactor LibarchiveFileReader to properly handle empty data blocks (fixes #389). core-clp: Refactor LibarchiveFileReader to properly handle empty data blocks (fixes #389). Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants