-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSFS reports directory as FileNotFoundError when it exists. Run 1 fails, run 2 succeeds. Caching? #632
Comments
You seem to have a key and a directory with the same name, which is unfortunate. While it is unclear which of these gcsfs should return with info(), I agree that it should be consistent. For the original issue over in arrow, I can point out that the following works fine:
i.e., specifying the filesystem rather than providing a protocol prefix. (also, fastparquet has no problem with any of the possible forms!) |
(note that noone asked for my opinion in the upstream arrow thread) |
Also: I am not able to reproduce your behaviour with or without a placeholder directory. Can you try to make a full reproducer, please? |
That's curious, when you say "key and directory with the same name" does that mean we wrote that dir as a key first? For context we're using pyspark to write a dataset to this path. I can imagine it creates a placeholder there first, although running spark against object storage is a pretty common scenario. |
I can't say how it came to be, only that I suppose you have both a key called "bucket/path" and stuff with names like "bucket/path/..." which also implies the directory. |
Hi,
We went down a rabbit hole trying to find this one.
apache/arrow#31339
Turns out Pandas can't read partitioned parquet files from a directory because of PyArrow using GCSFS.
However in this repo there seems to be no mention of this. Are you aware of any situation where the library is non-deterministic/has caching issues when listing a directory?
Returns:
Note first call vs. 2 and 3 are different results. What's up with that?
The text was updated successfully, but these errors were encountered: