Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Mar 4, 2024

Rationale for this change

Moving LocalFileSystem into the registry is a good first usage and will help us hammer out which aspects of built in file systems should remain public.

What changes are included in this PR?

A factory for LocalFileSystem is added to the registry. FileSystem::MakeUri ( #18316 ) is added to enable roundtripping filesystems through URIs. file:// URIs now support a use_mmap query parameter, and local:// URIs are also supported as an alias.

Reducing the set of bound classes

Some unnecessary bindings to the LocalFileSystem concrete class are removed. This demonstrates that with a registered factory pattern, it is no longer necessary to keep a class hierarchy public for binding. Eventually (if desired), we can move concrete subclasses of FileSystem entirely out of public headers.

Are these changes tested?

Yes, all existing tests for file:// URIs continue to pass

Are there any user-facing changes?

For consistency, local:// URIs will now be considered equivalent to corresponding file:// URIs

@bkietz bkietz force-pushed the 40342-localfs-module branch from 1f5e319 to 7e74b80 Compare March 19, 2024 17:36
@bkietz bkietz requested a review from felipecrv March 19, 2024 17:37
@bkietz bkietz force-pushed the 40342-localfs-module branch 2 times, most recently from fd4de2a to 211b82e Compare March 19, 2024 17:50
@apache apache deleted a comment from github-actions bot Mar 19, 2024
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python changes look good to me

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 19, 2024
@bkietz bkietz force-pushed the 40342-localfs-module branch from 211b82e to 320e4d5 Compare March 19, 2024 18:25
@bkietz bkietz marked this pull request as ready for review March 19, 2024 18:25
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 19, 2024
@bkietz bkietz force-pushed the 40342-localfs-module branch 2 times, most recently from be4468a to 46758bc Compare March 19, 2024 21:15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? The local scheme?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed more inconsistent to refuse the scheme since the type_name() of other filesystems is also a usable scheme. I can remove it

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 22, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 23, 2024
@bkietz
Copy link
Member Author

bkietz commented Apr 23, 2024

The only failures are a known flake #40675

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 23, 2024
@bkietz bkietz changed the title GH-40342: [C++] move LocalFileSystem to the registry GH-40342: [C++] move LocalFileSystem to the registry Apr 24, 2024
@bkietz bkietz merged commit fd75cbd into apache:main Apr 24, 2024
@bkietz bkietz removed the awaiting merge Awaiting merge label Apr 24, 2024
@bkietz bkietz deleted the 40342-localfs-module branch April 24, 2024 14:27
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 24, 2024
)

### Rationale for this change

Moving LocalFileSystem into the registry is a good first usage and will help us hammer out which aspects of built in file systems should remain public.

### What changes are included in this PR?

A factory for LocalFileSystem is added to the registry. `FileSystem::MakeUri` ( apache#18316 ) is added to enable roundtripping filesystems through URIs. `file://` URIs now support a use_mmap query parameter, and `local://` URIs are also supported as an alias.

<h6 id="reduce">Reducing the set of bound classes</h6>

Some unnecessary bindings to the LocalFileSystem concrete class are removed. This demonstrates that with a registered factory pattern, it is no longer necessary to keep a class hierarchy public for binding. Eventually (if desired), we can move concrete subclasses of FileSystem entirely out of public headers.

### Are these changes tested?

Yes, all existing tests for file:// URIs continue to pass

### Are there any user-facing changes?

For consistency, local:// URIs will now be considered equivalent to corresponding file:// URIs

* GitHub Issue: apache#40342

Authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fd75cbd.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 45 possible false positives for unstable benchmarks that are known to sometimes produce them.

Comment on lines -1113 to +1115
@staticmethod
@binding(True) # Required for cython < 3
def _reconstruct(kwargs):
# __reduce__ doesn't allow passing named arguments directly to the
# reconstructor, hence this wrapper.
return LocalFileSystem(**kwargs)

def __reduce__(self):
cdef CLocalFileSystemOptions opts = self.localfs.options()
return LocalFileSystem._reconstruct, (dict(
use_mmap=opts.use_mmap),)
uri = frombytes(GetResultValue(self.fs.MakeUri(b"/_")))
return FileSystem._from_uri, (uri,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring this has broken the cython-2 build (as the removed comments indicated, that code was required for cython < 3)

See https://github.com/ursacomputing/crossbow/actions/runs/8887296743/job/24402368974

(TBH, I would personally also fine with just dropping that build and requiring cython > 3 for building)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #41459 for a quick fix for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry

@github-actions github-actions bot added the awaiting changes Awaiting changes label Apr 30, 2024
jorisvandenbossche added a commit that referenced this pull request Apr 30, 2024
Small follow-up fix for the failure introduced by #40356
* GitHub Issue: #40342

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
)

### Rationale for this change

Moving LocalFileSystem into the registry is a good first usage and will help us hammer out which aspects of built in file systems should remain public.

### What changes are included in this PR?

A factory for LocalFileSystem is added to the registry. `FileSystem::MakeUri` ( apache#18316 ) is added to enable roundtripping filesystems through URIs. `file://` URIs now support a use_mmap query parameter, and `local://` URIs are also supported as an alias.

<h6 id="reduce">Reducing the set of bound classes</h6>

Some unnecessary bindings to the LocalFileSystem concrete class are removed. This demonstrates that with a registered factory pattern, it is no longer necessary to keep a class hierarchy public for binding. Eventually (if desired), we can move concrete subclasses of FileSystem entirely out of public headers.

### Are these changes tested?

Yes, all existing tests for file:// URIs continue to pass

### Are there any user-facing changes?

For consistency, local:// URIs will now be considered equivalent to corresponding file:// URIs

* GitHub Issue: apache#40342

Authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…apache#41459)

Small follow-up fix for the failure introduced by apache#40356
* GitHub Issue: apache#40342

Authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants