-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clp-s: Add support for chunking output into different files during timestamp-ordered decompression #451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Most of the comments are about the style changes.
auto finish_chunk = [&](bool open_new_writer) { | ||
writer.close(); | ||
std::string new_file_name = std::string(src_path) + "_" + std::to_string(first_timestamp) | ||
+ "_" + std::to_string(last_timestamp) + ".jsonl"; | ||
auto new_file_path = std::filesystem::path(new_file_name); | ||
std::error_code ec; | ||
std::filesystem::rename(src_path, new_file_path, ec); | ||
if (ec) { | ||
throw OperationFailed(ErrorCodeFailure, __FILE__, __LINE__, ec.message()); | ||
} | ||
|
||
if (open_new_writer) { | ||
writer.open(src_path, FileWriter::OpenMode::CreateForWriting); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any concerns of making it a private method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's easier to read this way, but if you prefer private method I can change it.
po::value<size_t>(&m_ordered_chunk_split_threshold) | ||
->default_value(m_ordered_chunk_split_threshold), | ||
"Number of records to include in each output chunk when decompressing records " | ||
"in order" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in order" | |
"in timestamp ascending order" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced with "in ascending timestamp order" instead.
Co-authored-by: wraymo <[email protected]>
decompression_options.add_options()( | ||
"ordered", | ||
po::bool_switch(&m_ordered_decompression), | ||
"Enable decompression in ascending timestamp order for this archive" | ||
)( | ||
"ordered-chunk-split-threshold", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to update the name of this argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, what about "clp-s: Add support for chunking output into different files during timestamp-ordered decompression"?
…mestamp-ordered decompression (y-scope#451) Co-authored-by: wraymo <[email protected]>
Description
This PR adds support for chunking the output of timestamp-ordered decompression into several files, where each file has at most the number of records specified in the command line argument. The argument
--ordered-chunk-split-threshold <value>
can be used in conjunction with the--ordered
argument during decompression to trigger this feature.Validation performed