Skip to content

Conversation

@mariusvniekerk
Copy link
Member

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.

@mariusvniekerk mariusvniekerk changed the title [ARROW-10433] Swopped the conditions for checking for fsspec filesystems ARROW-10433:[Python] Swopped the conditions for checking for fsspec filesystems Oct 30, 2020
@mariusvniekerk mariusvniekerk changed the title ARROW-10433:[Python] Swopped the conditions for checking for fsspec filesystems ARROW-10433 [Python] Swopped the conditions for checking for fsspec filesystems Oct 30, 2020
@mariusvniekerk
Copy link
Member Author

cc @jorisvandenbossche

@github-actions
Copy link

Since s3fs.S3FileSystem is an fsspec filesystem moving this check will
ensure that it is correctly detected as an fsspec filesystem and not
wrapped with S3FSWrapper.
@jorisvandenbossche
Copy link
Member

Rebased this, and running the tests that I added in #8573 with latest s3fs locally (we currently still get s3fs 0.4 on CI)

@pitrou
Copy link
Member

pitrou commented Nov 9, 2020

So what use is the S3FSWrapper after this PR?

@mariusvniekerk
Copy link
Member Author

might be useful still for s3fs<0.4 ?

@jorisvandenbossche
Copy link
Member

I was going to propose to actually completely remove the S3FSWrapper call here (instead of only moving it after the other check).

This class if from before my time, but as far as I understand, it was initially used to wrap s3fs filesystems (so they could be used in ParquetDataset). Later on, the fsspec filesystems (and thus s3fs as well) became actual pyarrow.filesystem.FileSystem subclasses, so they would be used directly in ParquetDataset. So in practice, this S3FSWrapper has not been in use for the last s3fs / fsspec releases.

But this change to inherit from pyarrow is already more than 2 years old (eg fsspec/filesystem_spec@f461317), so I think in practice nobody is (should be) using such an old fsspec version.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked locally that the relevant parquet tests are now passing with this patch and s3fs master (released s3fs 0.5.1 still has some regressions causing failures in reading partitioned parquet datasets)

As mentioned above, I think we can remove the S3FSWrapper call here alltogether, but will deal with that in a separate PR directly deprecating the class as well.

@iNecas
Copy link

iNecas commented Dec 15, 2020

I've just hit TypeError: 'coroutine' object is not iterable and turned out it being related to compatibility with s3fs. I was surprised not finding any issue related to pyarrow describing this problem, so I've filed one in https://issues.apache.org/jira/browse/ARROW-10921. For now, locking the s3fs to <0.5 worksaround the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants