Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP DirectoryIterator returns no files and triggers a "readdirplus failed" error #477

Closed
ale-rinaldi opened this issue Aug 23, 2023 · 2 comments · Fixed by #581
Closed
Labels
bug Something isn't working

Comments

@ale-rinaldi
Copy link

Mountpoint for Amazon S3 version

mount-s3 1.0.0

AWS Region

eu-west-1

Describe the running environment

Running in EC2 on Ubuntu with static access and secret key configured via awcli

What happened?

  • I created a test mount folder: mkdir /home/ubuntu/mount
  • I mounted a bucket with some files in it: mount-s3 my-test-bucket /home/ubuntu/mount
  • I ensured my files show on ls: ls /home/ubuntu/mount (list of files appears)
  • I installed PHP cli: sudo apt install php-cli
  • I created this PHP script:
<?php
$count = 0;
foreach (new \DirectoryIterator("/home/ubuntu/mount") as $fileInfo) {
    echo "Found file: " . $fileInfo->getFilename() . "\n";
    $count++;
}
echo "$count files found\n";
  • I launched the script: php test.php

Of course, I expected the list and count of files, but I get a "0 files found" instead.

When I run the script, a warning gets logged on syslog (pasted below). I tried multiple times with the same result, only "req=" and "fh=" values change.

Relevant log output

mount-s3[8770]: [WARN] readdirplus{req=30 ino=1 fh=3 offset=0}: mountpoint_s3::fuse: readdirplus failed: offset mismatch, expected=8, actual=0
@ale-rinaldi ale-rinaldi added the bug Something isn't working label Aug 23, 2023
@ale-rinaldi ale-rinaldi changed the title PHP DirectoryIterator returns a "readdirplus failed" error PHP DirectoryIterator returns no files and triggers a "readdirplus failed" error Aug 24, 2023
@ahmarsuhail
Copy link
Contributor

Looking at the debug logs, with DirectoryIterator, it sends an initial READDIRPLUS fh FileHandle(1), offset 0, size 4096, and then when after this request completes, it's next request is also READDIRPLUS fh FileHandle(1), offset 0, size 4096. That is, offset is 0, which causes the error. For the second request, it should be > 0, depending on how many files were read in the first request.

I tested with the following script:

<?php
$path = "/home/ubuntu/mnt";
$count = 0;
if ($handle = opendir($path)) {
    while (false !== ($file = readdir($handle))) {
        echo "Found file: " . $file . "\n";
        $count++;
    }
    echo "$count files found\n";
    closedir($handle);
}

and this works ok, and the offsets are set correctly. could you use this as a workaround?

@ale-rinaldi
Copy link
Author

Hello,

thanks for your answer.

Yeah, this is a viable workaround since it's code I have control on, thank you.

However, I can see that the DirectoryIterator is usually working, even with other S3 mount software. Maybe it's the response that Mountpoint gives to readdirplus that makes DirectoryIterator do subsequent calls with wrong offset? I think it's still worth investigating, since the DirectoryIterator could still be in some vendored code and this would be a mess :)

Thanks!

jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this issue Oct 27, 2023
POSIX allows seeking an open directory handle, which in FUSE means the
`offset` can be any offset we've previously returned. This is pretty
annoying for us to implement since we're streaming directory entries
from S3 with ListObjects, which can't resume from an arbitrary index,
and can't fit its continuation tokens into a 64-bit offset anyway. So
we're probably never going to truly support seeking a directory handle.

But there's a special case we've seen come up a couple of times (awslabs#477, awslabs#520):
some applications read one page of directory entries and then seek back
to 0 and do it again. I don't fully understand _why_ they do this, but
it's common enough that it's worth special casing.

This change makes open directory handles remember their most recent
response so that they can repeat it if asked for the same offset again.
It's not too complicated other than needing to make sure we do
readdirplus correctly (managing the lookup counts for entries that are
being returned a second time).

I've tested this by running the PHP example from awslabs#477, which now works.

Signed-off-by: James Bornholt <[email protected]>
jamesbornholt added a commit to jamesbornholt/mountpoint-s3 that referenced this issue Oct 27, 2023
POSIX allows seeking an open directory handle, which in FUSE means the
`offset` can be any offset we've previously returned. This is pretty
annoying for us to implement since we're streaming directory entries
from S3 with ListObjects, which can't resume from an arbitrary index,
and can't fit its continuation tokens into a 64-bit offset anyway. So
we're probably never going to truly support seeking a directory handle.

But there's a special case we've seen come up a couple of times (awslabs#477, awslabs#520):
some applications read one page of directory entries and then seek back
to 0 and do it again. I don't fully understand _why_ they do this, but
it's common enough that it's worth special casing.

This change makes open directory handles remember their most recent
response so that they can repeat it if asked for the same offset again.
It's not too complicated other than needing to make sure we do
readdirplus correctly (managing the lookup counts for entries that are
being returned a second time).

I've tested this by running the PHP example from awslabs#477, which now works.

Signed-off-by: James Bornholt <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue Oct 27, 2023
* Allow repeated readdir offsets

POSIX allows seeking an open directory handle, which in FUSE means the
`offset` can be any offset we've previously returned. This is pretty
annoying for us to implement since we're streaming directory entries
from S3 with ListObjects, which can't resume from an arbitrary index,
and can't fit its continuation tokens into a 64-bit offset anyway. So
we're probably never going to truly support seeking a directory handle.

But there's a special case we've seen come up a couple of times (#477, #520):
some applications read one page of directory entries and then seek back
to 0 and do it again. I don't fully understand _why_ they do this, but
it's common enough that it's worth special casing.

This change makes open directory handles remember their most recent
response so that they can repeat it if asked for the same offset again.
It's not too complicated other than needing to make sure we do
readdirplus correctly (managing the lookup counts for entries that are
being returned a second time).

I've tested this by running the PHP example from #477, which now works.

Signed-off-by: James Bornholt <[email protected]>

* PR feedback

Signed-off-by: James Bornholt <[email protected]>

* Changelog and docs

Signed-off-by: James Bornholt <[email protected]>

---------

Signed-off-by: James Bornholt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants