Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

Fix metadata computation on arrow import. #704

Merged
merged 2 commits into from
Oct 17, 2023
Merged

Conversation

ienkovich
Copy link
Contributor

This covers two changes.

  1. Revert my previous patch that was supposed to avoid unnecessary metadata computation but in fact, is just a dead code (last_orig_frag_idx is always 0).

  2. Fix a flaky SEGFAULT we can see in CI occasionally in ArrowStorageTest. This SEGFAULT is caused by improper exception handling. In Arrow import we have nested parallel_for and exceptions can be thrown by their tasks. According to TBB docs, if some task of an algorithm throws an exception then other tasks of this and nested algorithms can be cancelled. So, if some outer task throws an exception, then some other already running outer task will be finished, but all tasks it creates in its inner parallel_for can be silently canceled. Therefore, the inner parallel_for that computes metadata is simply skipped and the following code dereferences nullptr in its attempt to access the metadata.

The easiest solution I found is to use a separate isolated context for the inner parallel_for. It's not efficient because computed metadata is going to be dropped anyway, but it's simple and works.

    Avoid unnecessary chunk stats recomputation on append.

Signed-off-by: Ilya Enkovich <[email protected]>
@ienkovich ienkovich merged commit f9681ad into main Oct 17, 2023
@ienkovich ienkovich deleted the ienkovich/fix-throw-in-tbb branch October 17, 2023 15:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants