Skip to content

Comments

Remote IO support in cudf-polars#19921

Merged
rapids-bot[bot] merged 45 commits intorapidsai:branch-25.10from
Matt711:fea/polars/remote-io
Sep 18, 2025
Merged

Remote IO support in cudf-polars#19921
rapids-bot[bot] merged 45 commits intorapidsai:branch-25.10from
Matt711:fea/polars/remote-io

Conversation

@Matt711
Copy link
Contributor

@Matt711 Matt711 commented Sep 8, 2025

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

kingcrimsontianyu and others added 30 commits June 13, 2025 11:12
@github-actions github-actions bot added the pylibcudf Issues specific to the pylibcudf package label Sep 17, 2025
@Matt711
Copy link
Contributor Author

Matt711 commented Sep 17, 2025

/ok to test ef1e0a2

Comment on lines +341 to +347
# This works fine when the file has no leading blank lines,
# but currently we do some file introspection
# to skip blanks before parsing the header.
# For remote files we cannot determine if leading blank lines
# exist, so we're punting on CSV support.
# TODO: Once the CSV reader supports skipping leading
# blank lines natively, we can remove this guard.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @vuule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 18, 2025

/ok to test 4e4cb87

Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this PR is adding pytest-httpserver as a hard runtime dependency of cudf-polars, which I think was not your intention. I suggested a fix for that.

- output_types: conda
packages:
- cudf-polars==25.10.*,>=0.0.0a0
- pytest-httpserver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong in this depends_on_cudf_polars list.

  • these depends_on_* lists are only intended to serve one dependency
  • placing this here makes pytest-httpserver a runtime dependency of cudf-polars (which I believe you didn't intend)

Please move this to cudf-polars testing dependencies instead:

test_python_cudf_polars:

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 18, 2025

/ok to test 4cb3909

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 18, 2025

/ok to test 482e70d

if not isinstance(src, (os.PathLike, str)):
raise ValueError("All sources must be of the same type!")
if not (os.path.isfile(src) or self._is_remote_file_pattern.match(src)):
if not (os.path.isfile(src) or SourceInfo._is_remote_uri(src)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not (os.path.isfile(src) or SourceInfo._is_remote_uri(src)):
if not (os.path.isfile(src) or self._is_remote_uri(src)):

Nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _is_remote_uri is a staticmethod, SourceInfo._is_remote_uri is preferred, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's a particular preference. Generally using self would still automatically respect a subclass' override of _is_remote_uri which is a nice guarantee.

Probably not worth another commit & CI run if this PR is close to merging, but good to consider in the future.

Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as the packaging changes look ok to me, thanks for fixing that.

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 18, 2025

/ok to test 30a0594

@Matt711
Copy link
Contributor Author

Matt711 commented Sep 18, 2025

/merge

@rapids-bot rapids-bot bot merged commit b19da6e into rapidsai:branch-25.10 Sep 18, 2025
135 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Sep 18, 2025
@Matt711 Matt711 deleted the fea/polars/remote-io branch September 19, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf-polars Issues specific to cudf-polars feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA] Add support for reading WebHDFS files in cudf-polars

6 participants