This repository was archived by the owner on May 9, 2024. It is now read-only.
Fix metadata computation on arrow import. #704
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This covers two changes.
Revert my previous patch that was supposed to avoid unnecessary metadata computation but in fact, is just a dead code (
last_orig_frag_idx
is always 0).Fix a flaky
SEGFAULT
we can see in CI occasionally inArrowStorageTest
. ThisSEGFAULT
is caused by improper exception handling. In Arrow import we have nestedparallel_for
and exceptions can be thrown by their tasks. According to TBB docs, if some task of an algorithm throws an exception then other tasks of this and nested algorithms can be cancelled. So, if some outer task throws an exception, then some other already running outer task will be finished, but all tasks it creates in its innerparallel_for
can be silently canceled. Therefore, the innerparallel_for
that computes metadata is simply skipped and the following code dereferencesnullptr
in its attempt to access the metadata.The easiest solution I found is to use a separate isolated context for the inner
parallel_for
. It's not efficient because computed metadata is going to be dropped anyway, but it's simple and works.