Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Add non-lazy data import. #553

Merged
merged 1 commit into from
Jun 28, 2023
Merged

Add non-lazy data import. #553

merged 1 commit into from
Jun 28, 2023

Conversation

ienkovich
Copy link
Contributor

Add a new option that allows to avoid data fetch costs on execution moving it to data import. This option is for benchmarking only when we want to split data import/transformation time and execution time. This can lead to an overall workload execution time increase in the case of unused columns in imported data.

Signed-off-by: ienkovich <[email protected]>
@@ -715,7 +715,8 @@ void ArrowStorage::appendArrowTable(std::shared_ptr<arrow::Table> at, int table_
switch (col_arr->type()->id()) {
case arrow::Type::STRING:
// if the dictionary has already been materialized, append indices
if (!config_->storage.enable_lazy_dict_materialization ||
if (config_->storage.enable_non_lazy_data_import ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary simply to force non-lazy data import in all cases when non lazy data import is enabled in config (vs just lazy dict materialization), or is there a codepath here that is important if we want non-lazy data import and lazy dict materialization is already disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this change assuming that lazy dictionaries are going to be enabled by default soon and I would like to disable all lazy import features with a single flag in benchmarks.

@ienkovich ienkovich merged commit aab3f75 into main Jun 28, 2023
@ienkovich ienkovich deleted the ienkovich/non-lazy-import branch June 28, 2023 17:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants