-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isdir/info method works incorrectly #574
Comments
I don't have an MCVE but can say I've also run into this before. The hack to run |
It seems that I've found the root cause of this problem. I decided to remove this directory and upload it again, and that solved the whole problem. |
AFAIK this was tackled in #313, but for some reason never merged to main branch. Motivating example using a public bucket import fsspec
path = "gs://hadoop-lib/gcs"
fs = fsspec.filesystem("gs") With python 3.7 and fsspec 2023.1.0: fs.info(path) # {'kind': 'storage#object', ... , 'type': 'file'}
fs.info(path) # {..., 'type': 'directory'} <--- THIS IS TRUE With python 3.9 and fsspec 2023.10.0: fs.info(path) # {'kind': 'storage#object', ... , 'type': 'file'}
fs.info(path) # {'kind': 'storage#object', ... , 'type': 'file'} (at least it's consistently wrong) |
There is indeed an empty file: 'hadoop-lib/gcs/', so . With #313 this is shown as a directory? |
hadoop-lib/gcs is a folder, containing other files. The library incorrectly recognizes it as a file. |
it is also a file, though. In pathstring-land, the final "/" is immaterial. |
#313 appears abandoned, maybe it should be resurrected. |
I'm affected by this as well:
|
I am guessing "henning-test/empty_folder/" is a zero-length-file? It would be best to show your complete set of files (as returned by find() ) to understand better. |
Which still does not explain this:
Or does it? |
:) For the first case, it is conventional that ls(file-path) should return [file-path]. This in
For the second, the return is showing you the zero-length-file, which has the same prefix as the path you asked to list. Yes, it's complicated, because GCS doesn't have folders, only "common prefixes". Sometimes it's convenient to assume these (optional) weird placeholders are the same as directories, but sometimes you need to know they are there. |
But it doesn't really match the behavior of
|
What specifically? |
After reading the documentation again, I realize it might also compatible with returning the path to a file itself, when called on a file. Although this seems awkward to me. |
I agree that it would be good to sketch out all the possible cases, and then write a bunch of tests which can be applied to any filesystem and thus ensure consistency (much as was done for get/put/rm). This is a fair amount of work! |
+1 to what @hmeyer said. GCS and local file-system should be consistent! And also consistent with the rest of Python ecosystem ( empty_folder = '/.../empty_folder/'
assert os.listdir(empty_folder) == []
assert list(pathlib.Path(empty_folder).iterdir()) == []
assert local_fs.ls(empty_folder) == []
assert gcs_fs.ls(empty_folder) == ['/.../empty_folder/'] # <<<< Inconsistent and very surprising |
Let me point out, again, that in local filesystems, having a file and a directory with the same name is not allowed, and having a file with "/" as the last character is not allowed. This is a real difference we can't pretend doesn't exist. So, gcsfs is not_wrong in what it says. Since batch operations (e.g., |
Actually it is perfectly ok to create a file with a trailing "/" locally:
And I think following the principle of least surprise, it would be good if gcsfs would behave similar to |
Am I being dim, or is this not the path you had asked for? |
It is the path I asked for, but curiously |
Hello,
I've found a strange behavior of the
isdir
method (digging deeper also withinfo
method). It returns incorrect values. These values seem to be returned randomly.I use Python 3.10.12 and I've tested this behavior on
gscfs=2022.3.0
, and the latest versiongscfs=2023.6.0
I've prepared a helper function to show what is happening here:
Problem example
An exemplary run:
Results:
As you can see, some directories are incorrectly treated as files. So more, values returned by the
info
andisdir
methods are inconsistent.Another insight
Changing the order of calling these methods, like in the snippet below:
makes
is_dir
contains a correct value, butinfo_type
incorrect one. Like here:Workaround
For now, my workaround is to run
isdir
method two times:It works:
But I want to work with this library without such workaround ;)
The text was updated successfully, but these errors were encountered: