Add a unified remote I/O interface that infers the endpoint type from URL (1/2): C++ implementation#793
Conversation
…destroy the remote file only once per test suite instead of per test
Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>
|
/ok to test 64e8713 |
|
/ok to test 68e3a16 |
|
An issue not addressed in this PR and requires more thinking in the future: Endpoints on the current 25.10 only perform light syntax check for URL using regular expressions. Should we add more extensive URL validation (RFC 3986 plus) inside the constructor, or should we keep validation and construction separated? In this PR, the |
|
/ok to test a498a83 |
|
/ok to test bba7253 |
madsbk
left a comment
There was a problem hiding this comment.
Looks good, I only have minor suggestions
|
/ok to test 357a615 |
|
/merge |
… URL (2/2): Python binding (#808) This PR adds Python binding to #793 Closes #807 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) URL: #808
With rapidsai/kvikio#793, KvikIO can infer the endpoint type from the URL, supporting creation of a broader range of remote resources via a single interface. This PR updates the cuDF data source accordingly. Specifically, `pylibcudf` now can read from WebHDFS, S3, S3 presigned URL resources. Partially addresses #19633 Authors: - Tianyu Liu (https://github.com/kingcrimsontianyu) - Vukasin Milovanovic (https://github.com/vuule) - Matthew Murray (https://github.com/Matt711) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Nghia Truong (https://github.com/ttnghia) - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) - Matthew Murray (https://github.com/Matt711) URL: #19788
This PR adds a new remote I/O utility function
RemoteHandle::open(url)that infers the remote endpoint type from the URL to facilitateRemoteHandlecreation.openfigure it out, users can explicitly specify the endpoint type by passing an enum argumentRemoteEndpointType.RemoteHandle(endpoint, nbytes).A byproduct of this PR is an internal utility class
UrlParserthat uses the idiomatic libcurl URL API to validate the URL against "RFC 3986 plus".This PR depends on