Skip to content

Relax mmap read requirement. Improve error message.#781

Merged
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.10from
kingcrimsontianyu:relax-mmap-req
Jul 25, 2025
Merged

Relax mmap read requirement. Improve error message.#781
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.10from
kingcrimsontianyu:relax-mmap-req

Conversation

@kingcrimsontianyu
Copy link
Contributor

cuDF PR rapidsai/cudf#19164 currently has 4 failed unit tests when LIBCUDF_MMAP_ENABLED=ON:

28 - CSV_TEST (Failed)
29 - ORC_TEST (Failed)
32 - JSON_TEST (Failed)
40 - DATA_CHUNK_SOURCE_TEST (Failed)

The fix entails code changes on both the KvikIO and cuDF sides.
On the KvikIO side, the MmapHandle::read() and MmapHandle::pread() methods need to:

  • Allow the read size to be 0
  • Allow offset to be equal to initial_map_offset (when the read size is 0)

This PR makes this change. In addition, this PR adds more detailed error messages when out-of-range exception occurs.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@kingcrimsontianyu kingcrimsontianyu added bug Something isn't working non-breaking Introduces a non-breaking change c++ Affects the C++ API of KvikIO labels Jul 24, 2025
@kingcrimsontianyu kingcrimsontianyu marked this pull request as ready for review July 24, 2025 13:52
@kingcrimsontianyu kingcrimsontianyu requested a review from a team as a code owner July 24, 2025 13:52
@kingcrimsontianyu kingcrimsontianyu changed the base branch from branch-25.08 to branch-25.10 July 24, 2025 15:09
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I only have a minor style suggestion

cpp/src/mmap.cpp Outdated
auto actual_size = size.has_value() ? size.value() : _file_size - offset;
KVIKIO_EXPECT(actual_size > 0, "Read size must be greater than 0", std::invalid_argument);

ss.str("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Once the C++20 build issue was solved (#749), I'll clean up the repo a bit using std::format.

@kingcrimsontianyu
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 888fcbf into rapidsai:branch-25.10 Jul 25, 2025
49 checks passed
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Aug 25, 2025
### Background
Libcudf 25.04 and earlier use `memory_mapped_source` by default. For host read it uses memory mmaped I/O, and for device read it uses standard I/O. 

Libcudf 25.06 uses standard I/O by default. For host read, `memory_mapped_source`  still uses memory mmaped I/O, while the device read is no longer allowed (runtime exception if used). Parquet/ORC readers fall back to the host read + H2D copy to emulate device read for the mapped source.

### This PR
Now that the file-backed memory mapping (C++) is supported by KvikIO (rapidsai/kvikio#740), this PR updates libcudf to reinvigorate `memory_mapped_source` using KvikIO's implementation. This re-enables device read and brings performance improvement (e.g. through parallel prefault).

### Notes
`memory_mapped_source` is an implementation detail in `datasource.cpp`. Currently testing was conducted on C2C (arm) and PCIe (x86) systems by manually setting `LIBCUDF_MMAP_ENABLED=ON` and running the tests. Refer to rapidsai/kvikio#530 (comment) for benchmark results.

### Dependency
This PR depends on KvikIO PR rapidsai/kvikio#781 to fix unit test errors.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #19164
galipremsagar pushed a commit to galipremsagar/cudf that referenced this pull request Aug 27, 2025
…9164)

### Background
Libcudf 25.04 and earlier use `memory_mapped_source` by default. For host read it uses memory mmaped I/O, and for device read it uses standard I/O. 

Libcudf 25.06 uses standard I/O by default. For host read, `memory_mapped_source`  still uses memory mmaped I/O, while the device read is no longer allowed (runtime exception if used). Parquet/ORC readers fall back to the host read + H2D copy to emulate device read for the mapped source.

### This PR
Now that the file-backed memory mapping (C++) is supported by KvikIO (rapidsai/kvikio#740), this PR updates libcudf to reinvigorate `memory_mapped_source` using KvikIO's implementation. This re-enables device read and brings performance improvement (e.g. through parallel prefault).

### Notes
`memory_mapped_source` is an implementation detail in `datasource.cpp`. Currently testing was conducted on C2C (arm) and PCIe (x86) systems by manually setting `LIBCUDF_MMAP_ENABLED=ON` and running the tests. Refer to rapidsai/kvikio#530 (comment) for benchmark results.

### Dependency
This PR depends on KvikIO PR rapidsai/kvikio#781 to fix unit test errors.

Authors:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)

URL: rapidsai#19164
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working c++ Affects the C++ API of KvikIO non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants