Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS #17122

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Oct 18, 2024

Description

When LIBCUDF_CUFILE_POLICY is set to GDS or ALWAYS, cuDF uses an internal implementation to call the cuFile API and harness the GDS feature. Recent tests with these two settings were unsuccessful due to program crash. Specifically, for the PARQUET_READER_NVBENCH's parquet_read_io_compression benchmark:

  • GDS write randomly crashed with segmentation fault (SIGSEGV).
  • GDS read randomly crashed with bus error (SIGBUS).
  • At the time of crash, stack frame is randomly corrupted.

The root cause is the use of dangling reference, which occurs when a variable is captured by reference by nested lambdas. This PR performs a hotfix that turns out to be a 1-char change.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Oct 18, 2024
@kingcrimsontianyu kingcrimsontianyu self-assigned this Oct 18, 2024
@kingcrimsontianyu kingcrimsontianyu added non-breaking Non-breaking change bug Something isn't working labels Oct 18, 2024
@kingcrimsontianyu kingcrimsontianyu changed the title Fix the GDS read/write segfault when cuFile policy is set to GDS or ALWAYS Hotfix: Fix the GDS read/write segfault when cuFile policy is set to GDS or ALWAYS Oct 18, 2024
@kingcrimsontianyu kingcrimsontianyu changed the title Hotfix: Fix the GDS read/write segfault when cuFile policy is set to GDS or ALWAYS Hotfix: Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS Oct 18, 2024
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review October 18, 2024 13:11
@kingcrimsontianyu kingcrimsontianyu requested a review from a team as a code owner October 18, 2024 13:11
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we backporting this to 24.10? That’s what the title “hotfix” implies. If so, let’s merge this to 24.12 without the word “hotfix” in the title (it goes into the changelog) and we can name the backport PR “hotfix” if needed.

@@ -239,7 +239,7 @@ std::vector<std::future<ResultT>> make_sliced_tasks(
std::vector<std::future<ResultT>> slice_tasks;
std::transform(slices.cbegin(), slices.cend(), std::back_inserter(slice_tasks), [&](auto& slice) {
return pool.submit_task(
[&] { return function(ptr + slice.offset, slice.size, offset + slice.offset); });
[=] { return function(ptr + slice.offset, slice.size, offset + slice.offset); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this worked for as long as it did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar to the bug that I recently fixed for ORC reader, when reading file using the default stream 0. It's never showed up until recently.

@kingcrimsontianyu kingcrimsontianyu changed the title Hotfix: Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS Oct 18, 2024
@kingcrimsontianyu
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 6ca721c into rapidsai:branch-24.12 Oct 18, 2024
132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants