Mac: sporadic vnode I/O errors #328

pmj · 2018-10-03T14:38:32Z

There seem to be a number of places in the kext's vnode/fileop handler where I/O (reading attributes, etc.) occasionally fails. In a few places there are asserts that even take down the system in case of such failures. We need to track down all the failure cases, some common root causes, and handle them correctly and gracefully in the kext instead of crashing the system.

Known failure cases:

Reports of more cases appreciated.

The text was updated successfully, but these errors were encountered:

derrickstolee · 2018-10-03T17:56:23Z

@pmj Is the following test case that failed on a Mac functional test related? Or should I create a separate issue?

1) Failed : GVFS.FunctionalTests.Tests.EnlistmentPerFixture.MultithreadedReadWriteTests.CanReadVirtualFileInParallel()
  At least one of the reads failed
  Expected: null
  But was:  <System.IO.IOException: The process cannot access the file '/GVFS.FT/test/a6a14cb56232452c8d68/src/GVFS/GVFS.FunctionalTests/Tests/LongRunningEnlistment/GitMoveRenameTests.cs' because it is being used by another process.
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirectory, Func`2 errorRewriter)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.IO.StreamReader..ctor(String path, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize)
   at System.IO.File.InternalReadAllText(String path, Encoding encoding)
   at System.IO.File.ReadAllText(String path)
   at GVFS.FunctionalTests.FileSystemRunners.SystemIORunner.ReadAllText(String path) in /Users/vsts/agent/2.140.2/work/1/s/GVFS/GVFS.FunctionalTests/FileSystemRunners/SystemIORunner.cs:line 70
   at GVFS.FunctionalTests.Tests.EnlistmentPerFixture.MultithreadedReadWriteTests.<>c__DisplayClass0_0.<CanReadVirtualFileInParallel>b__0() in /Users/vsts/agent/2.140.2/work/1/s/GVFS/GVFS.FunctionalTests/Tests/EnlistmentPerFixture/MultithreadedReadWriteTests.cs:line 36>
   at GVFS.Tests.Should.ValueShouldExtensions.ShouldBeNull[T](T obj, String message) in /Users/vsts/agent/2.140.2/work/1/s/GVFS/GVFS.Tests/Should/ValueShouldExtensions.cs:line 76
   at GVFS.FunctionalTests.Tests.EnlistmentPerFixture.MultithreadedReadWriteTests.CanReadVirtualFileInParallel() in /Users/vsts/agent/2.140.2/work/1/s/GVFS/GVFS.FunctionalTests/Tests/EnlistmentPerFixture/MultithreadedReadWriteTests.cs:line 52

sanoursa · 2018-10-03T17:59:34Z

@derrickstolee that one is covered by #315

derrickstolee · 2018-10-03T18:01:30Z

Thanks for the help! Sorry for the noise.

nickgra · 2018-10-04T00:44:56Z

I think the two failure cases that we're describing are the same thing. I've only ever seen this callstack (line numbers will be off for your branches/HEAD):

0xffffffa3e114bae0 : 0xffffff7f8bb544c9 io.gvfs.PrjFSKext : __ZL18ReadVNodeFileFlagsP5vnodeP11vfs_context + 0xa9
0xffffffa3e114bcc0 : 0xffffff7f8bb5496d io.gvfs.PrjFSKext : __ZL21FileIsFlaggedAsInRootP5vnodeP11vfs_context + 0x1d
0xffffffa3e114bcf0 : 0xffffff7f8bb5381a io.gvfs.PrjFSKext : __ZL21HandleFileOpOperationP5ucredPvimmmm + 0x22a

The linecount for HandleFileOpOperation will show that we're in the file handle close path.

I hit this all the time when working on large builds. At least a 50% hit rate during the most parallel parts of the build. One thought that comes to mind is because there's so much I/O flying around in the system, we hit a vnode that is incomplete and missing attributes (which would be an OS bug IMO).

Alternatively, some system process produces a VNodeType that we're not supporting and since we intercept every file handle close, we're bound to hit this eventually. I've found machines that I've left idle for days panicked on this call stack so that might be the better explanation.

I think we need to take some fix for this particular panic soon regardless. I know @wilbaker has some opinions about 'hiding' the problem as your current PR does, I'm curious to see what others think.

nickgra · 2018-10-06T00:05:46Z

On certain Mac Pros (but not the ones sitting on my desk), I can repro this 100% of the time. Having the kext loaded and then running our PrepFunctionalTests.sh script (which first mounts a .dmg and runs a .pkg in it) will hit this panic.

Just another data point, it doesn't change my explanation for what's going on here, it just proves that I/O on disk image volumes is triggering interactions in our kext when it shouldn't be.

Edit: For closure, I figured out what's different. Someone installed System Center Endpoint Protection on these machines, which brings its own kext. So the root cause here might be similar to the issue with the Google Drive kext.

pmj · 2018-10-25T13:43:08Z

I've just hit this condition, with the same trace as @nickgra's during development/testing. In my case, it failed with err == 9 (EBADF, Bad file descriptor), and vn_getpath failed on the vnode. I'm going to put some extra logging in my local version that prints more diagnostic data about the vnode, plus the path passed in rather than trying to generate it. I'm suspecting this is somehow a vnode that ends up deleted/recycled before we're done with it?

I've not got any AV installed on this test machine. (Quad core 2012 Mac Mini)

pmj · 2018-11-28T17:10:59Z

I've run into cases where mac_vnop_getxattr() returns 1 (EPERM) on resource fork vnodes (path/to/some/file/..namedfork/rsrc). These must be happening on a FILEOP code path, as that doesn't seem to check for named forks, whereas the VNODE listener handler does. I'll investigate further and put together a fix as appropriate.

pmj · 2018-11-28T17:53:29Z

I've just run into another error, this time getting flags in the KAUTH_FILEOP_CLOSE is failing. The vnode in question is already being recycled, and oddly enough, even the path passed to the callback is an empty string. I guess the kernel generates it by calling vn_getpath() too late before it calls our callback. With the vnode cache I guess we would be able to identify what file the vnode originally referred to in this case, but other than that I don't think there's much we can do here - except perhaps file a bug with Apple, because passing a recycled vnode and an empty path here seems counterproductive.

wilbaker · 2018-11-29T00:02:09Z

The vnode in question is already being recycled, and oddly enough, even the path passed to the callback is an empty string. I guess the kernel generates it by calling vn_getpath() too late before it calls our callback.

What's the best way (assuming there is a way) to programmatically check that a vnode has been recycled?

With the vnode cache I guess we would be able to identify what file the vnode originally referred to in this case, but other than that I don't think there's much we can do here.

With the current vnode cache proposal we don't cache anything about the file itself. Even if we did (e.g. cache fsid and inode) I think the cache would only help us if the vnode in the cache happened to have the same vid.

Did you have any ideas in mind for something more the cache could do?

I don't think there's much we can do here - except perhaps file a bug with Apple, because passing a recycled vnode and an empty path here seems counterproductive.

Although (I believe) it's asynchronous, I wonder if the File Events API would be more reliable that listening to FileOps in the kext. Any thoughts on that approach?

Another approach might be to rely on the vnode operations rather than KAUTH_FILEOP_CLOSE (for the events that we're currently generating from KAUTH_FILEOP_CLOSE).

If we wanted to stick with the current approach are you aware of any machine\macOS settings we could tweak to make this less likely to occur?

pmj · 2018-11-29T10:12:02Z

What's the best way (assuming there is a way) to programmatically check that a vnode has been recycled?

The check for whether it is being recycled is vnode_isrecycled(). (This is what I used in KAUTH_FILEOP_CLOSE.
The only way you'd know if it has already been recycled is by comparing vid before & after.

As long as there are references (vnode_ref()) to a vnode, it can end up dead (recycling) but shouldn't be reused for a different file until all references are dropped. I'll need to check, but I strongly suspect that the KAUTH_FILEOP_CLOSE path does indeed hold some kind of reference to the vnode until it's completely done. So I don't think it should change identity under our noses.

With the current vnode cache proposal we don't cache anything about the file itself. Even if we did (e.g. cache fsid and inode) I think the cache would only help us if the vnode in the cache happened to have the same vid.
Did you have any ideas in mind for something more the cache could do?

I don't think the vid will still match once the recycling process has begun, it should have been incremented once at that point; we can technically detect that, but it's an implementation detail rather than something that's guaranteed. However, it's vanishingly unlikely that once we get a KAUTH_FILEOP_CLOSE event, that we haven't seen that vnode with its pre-recycled identity. Before a process closes a handle to it, it must surely have opened it (or inherited the fd from a process that did), for example.

Although (I believe) it's asynchronous, I wonder if the File Events API would be more reliable that listening to FileOps in the kext. Any thoughts on that approach?

Yeah, fsevents would work for simply detecting changes to files. The async nature is fairly irrelevant, as our kext to provider messaging is also currently async.

Another approach might be to rely on the vnode operations rather than KAUTH_FILEOP_CLOSE (for the events that we're currently generating from KAUTH_FILEOP_CLOSE).

Yes, I think that's probably not a bad idea anyway, I'm not sure KAUTH_FILEOP_CLOSE will necessarily catch all modifications. Typical dev tools probably use fairly pedestrian access patterns, but I would not be surprised if we could dream up some edge cases where KAUTH_FILEOP_CLOSE either doesn't fire or KAUTH_FILEOP_CLOSE_MODIFIED doesn't get set.

I assume it's not a problem if we end up with false positives for modified/created messages? What about timing? Are these notifications used lazily on the next git status and similar or are they acted upon immediately? If it's lazy, we could send the messages on receiving a KAUTH_VNODE_WRITE_DATA notification; at that point, the write might still be denied (false positive) or happen in the future, but it should be more reliable than KAUTH_FILEOP_CLOSE. If this leads to too many messages, we could set a flag in the vnode cache when we send a message which inhibits further messages, and the provider could reset that flag if it wants to see the message again on future changes.

(If false positives are OK, we can send a modified message from KAUTH_FILEOP_CLOSE even if the vnode's identity isn't 100% clear, the original problem I described. We'd need to track its identity in the vnode cache as you say.)

If we wanted to stick with the current approach are you aware of any machine\macOS settings we could tweak to make this less likely to occur?

No, I don't think this is something we can influence via settings.

pmj · 2018-11-29T10:50:12Z

I'll need to check, but I strongly suspect that the KAUTH_FILEOP_CLOSE path does indeed hold some kind of reference to the vnode until it's completely done. So I don't think it should change identity under our noses.

I've looked this up in the xnu source now: the caller of the KAUTH_FILEOP_CLOSE notification, close_internal_locked() in bsd/kern/kern_descrip.c surrounds the kauth call with vnode_getwithref() … vnode_put(), so there's definitely a persistent reference there - the vnode will not have changed identity, it may just be recycled by the time we get to it in the kauth listener handler.

wilbaker · 2018-11-29T17:48:23Z

The check for whether it is being recycled is vnode_isrecycled(). (This is what I used in KAUTH_FILEOP_CLOSE.)

Thanks for pointing me to your changes, I clearly need to catch up on your PRs!

I've looked this up in the xnu source now: the caller of the KAUTH_FILEOP_CLOSE notification, close_internal_locked() in bsd/kern/kern_descrip.c surrounds the kauth call with vnode_getwithref() … vnode_put(), so there's definitely a persistent reference there - the vnode will not have changed identity, it may just be recycled by the time we get to it in the kauth listener handler.

I checked Apple's document for vnode_getwithref(), and if it succeeded I'm not sure why the vnode would have been recycled.

From Apple docs (emphasis mine):

Increase the iocount on a vnode on which a usecount (persistent reference) is held.
If vnode_getwithref() succeeds, the resulting io-reference must be dropped with vnode_put(). vnode_getwithref() will succeed on dead vnodes; it should fail with ENOENT on vnodes which are in the process of being reclaimed. Because it is only called with a usecount on the vnode, the caller is guaranteed that the vnode has not been reused for a different file, though it may now be dead and have deadfs vnops (which return errors like EIO, ENXIO, ENOTDIR). On success, vnode_getwithref() returns with an iocount held on the vnode; this type of reference is intended to be held only for short periods of time (e.g. across a function call) and provides a strong guarantee about the life of the vnode. vnodes with positive iocounts cannot be recycled. An iocount is required for any operation on a vnode.

If vnode_getwithref() was called on a live vnode my reading of the documention is that it should not be recycled. Do you think when it was called the vnode was dead, or are there other scenarios\details I might be missing?

I don't think the vid will still match once the recycling process has begun, it should have been incremented once at that point; we can technically detect that, but it's an implementation detail rather than something that's guaranteed. However, it's vanishingly unlikely that once we get a KAUTH_FILEOP_CLOSE event, that we haven't seen that vnode with its pre-recycled identity. Before a process closes a handle to it, it must surely have opened it (or inherited the fd from a process that did), for example.

If I'm following you correctly I think you're suggesting something like this:

if ( vnode_isrecycled(fileop_vnode) &&
     cached_vnode->vid + 1 == fileop_vnode->vid)
{
    // Cached vnode match
}

Is that correct?

I do share your concern that if we went with this approach we'd be relying on an undocumented implementation detail (and based on the documentation of vnode_getwithref() it's still not clear to me why this vnode was recycled).

Yeah, fsevents would work for simply detecting changes to files. The async nature is fairly irrelevant, as our kext to provider messaging is also currently async.

Last time I checked the source from the comments\naming it sounded like KAUTH_FILEOP_CLOSE is delivered to our kext synchronously as the file is being closed, do you know if that's accurate?

I ask because there is a subtle difference when it comes to how we are asynchronous vs fsevents being asynchronous. Although it's true that our user-mode process will pick up the message from the kext asynchronously, the kext will block the FileOp callback until it receives a response from user-mode (making the kext's handling of the event synchronous from the perspective of the caller).

I assume it's not a problem if we end up with false positives for modified/created messages?

False positives are not a problem as long as we don't end up with too many of them (especially when it comes to modified messages). We're able to keep git commands quick by restricting the set of files that git considers, and each false positive means there's one more file that git needs to consider. If this is infrequent then there's no problem, but we don't want git to end up checking all of the files in a repo (unless they've all actually been modified).

On the Windows platform we aggressively add files to the set that git considers and this has not caused issues (files are added when a handle is opened with write access, and writes may never occur).

What about timing? Are these notifications used lazily on the next git status and similar or are they acted upon immediately?

VFS4G will immediately add the modified\created files to a list that it maintains. Whenever a git command runs VFS4G provides this list of files to git so that it know what files it should consider.

So, if a user has a script like this:

echo extraContent > testdata.txt
git status

We want to make sure that VFS4G has been notified that testdata.txt has been modified before it provides git with the list of files that git should consider during git status.

If it's lazy, we could send the messages on receiving a KAUTH_VNODE_WRITE_DATA notification; at that point, the write might still be denied (false positive) or happen in the future, but it should be more reliable than KAUTH_FILEOP_CLOSE.

That sounds like a good option to me. I think we'll simply need to give it a try and see how frequent false positives are. My hope is that we can sufficiently reduce false positives by only notifying VFS4G when the KAUTH_VNODE_ACCESS bit is not set. (My understanding is that there will be a KAUTH_VNODE_WRITE_DATA event without the KAUTH_VNODE_ACCESS bit set when a real write is about to occur)

If false positives are OK, we can send a modified message from KAUTH_FILEOP_CLOSE even if the vnode's identity isn't 100% clear, the original problem I described. We'd need to track its identity in the vnode cache as you say.

False positives are okay, but I think the bigger issue\concern is completely missing events for recycled vnodes that we're unable to reason about.

wilbaker · 2018-11-29T18:57:03Z

That sounds like a good option to me. I think we'll simply need to give it a try and see how frequent false positives are.

We already had #209 (partially) open for this change. I've updated it with the details from this discussion.

pmj · 2018-11-29T20:47:55Z

Thanks for pointing me to your changes, I clearly need to catch up on your PRs!

I'm not really expecting anyone to review #555. :-)

I checked Apple's document for vnode_getwithref(), and if it succeeded I'm not sure why the vnode would have been recycled.
[…]
If vnode_getwithref() was called on a live vnode my reading of the documention is that it should not be recycled. Do you think when it was called the vnode was dead, or are there other scenarios\details I might be missing?

As so often, I think the terminology is used inconsistently, so that might be what's causing confusion: vnode_isrecycled() returns true if a vnode is dead, i.e. ready for recycling. Once it's been recycled, it's got a proper identity again, and is very much not dead, and indeed vnode_isrecycled() returns false once this has happened. So yeah, misnamed function.

In this specific case, the vnode won't die between vnode_getwithref() and vnode_put(), but it may well die before vnode_getwithref(). It will however not be fully recycled (get a new identity) until the usecount hits 0. (The fact that vnode_getwithref() is used implies there is an active usecount which must be released with vnode_rele().)

If I'm following you correctly I think you're suggesting something like this:
[…]
Is that correct?

Precisely.

Last time I checked the source from the comments\naming it sounded like KAUTH_FILEOP_CLOSE is delivered to our kext synchronously as the file is being closed, do you know if that's accurate?

Ah, indeed, I forgot that the kext waits for a response for the modified messages too. Your example with the script makes sense. So KAUTH_VNODE_WRITE_DATA would still be a decent option in that case.

That sounds like a good option to me. I think we'll simply need to give it a try and see how frequent false positives are. My hope is that we can sufficiently reduce false positives by only notifying VFS4G when the KAUTH_VNODE_ACCESS bit is not set. (My understanding is that there will be a KAUTH_VNODE_WRITE_DATA event without the KAUTH_VNODE_ACCESS bit set when a real write is about to occur)

Yeah; I seem to remember having some issues with the auth cache in that context, where ACCESS results were cached and I never got a non-ACCESS callback. I might be misremembering (this was on a project in 2012/2013) but as we've disabled the cache, it shouldn't affect us either way.

Of course, we or another vnode listener kext could return DENY from the callback, in which case KAUTH_VNODE_WRITE_DATA does not precede an actual write, but those cases should be pretty rare for the most part. I don't expect we'll be generating a deluge of false positives with this.

pmj · 2018-12-04T19:52:15Z

Following out-of-band discussion, I'm going to wrap up most of the error logging I added with #555 into a PR we can merge into master, so that we can get a better overall picture of these errors and so that they are logged automatically on end user machines (c.f. #396) as they may help with diagnosing problems.

Additionally, we want to start mitigating the known problems, particularly the name cache misses. (with a view to implementing something along the lines of #338)

pmj · 2018-12-18T16:38:48Z

Note to self: add functional tests CI step that scans the kext log for any errors and fails the job if there were errors.

nickgra · 2019-01-10T22:48:36Z

Not sure if this is of interest, but I had the logger running to test some theories and while I was not interacting with my virtualization root, we hit some of the new errors from 643:

(0x1903:   135:   138.699) Error: ReadVNodeFileFlags: GetVNodeAttributes failed with error 9; vnode type: 8, recycled: yes (vnode path: '')
(0x1903:   136:   138.699) Error: KAUTH_FILEOP_CLOSE: checking file flags failed. Path = '' (vnode name: '[NULL]', type: 8, recycling: yes, mount point mounted at path '')

pmj · 2019-01-11T13:33:00Z

Yeah, this has turned up before - it seems that by the time we get the KAUTH_FILEOP_CLOSE event (presumably on a file that was deleted before the last process closed its handle), the vnode can already be in the process of being recycled. I believe this behaviour is part of what needs to be solved in #209 so I'll leave the error log in there until that issue has been resolved - also in case we run into a different type of error in the same location. Still - thanks for reporting!

pmj added type: bug domain: kext labels Oct 3, 2018

pmj self-assigned this Oct 3, 2018

pmj mentioned this issue Oct 3, 2018

Mac kext: Virtualisation root refactoring & array resizing #322

Merged

pmj mentioned this issue Oct 6, 2018

Mac kext: Adds check to only handle fileops on allowed filesystems #352

Merged

pmj mentioned this issue Oct 25, 2018

Mac: Do not panic when file attributes cannot be read during vnode callback #418

Merged

sanoursa added this to the Early Adopters milestone Nov 16, 2018

pmj mentioned this issue Nov 28, 2018

DONTMERGE/WIP: Debug kext io errors #555

Closed

wilbaker mentioned this issue Nov 29, 2018

Mac: Add PreConvertToFull notification that's driven by vnode events #209

Closed

pmj mentioned this issue Dec 19, 2018

Mac kext: log more I/O errors #643

Merged

jrbriggs removed the platform: macOS label Feb 7, 2019

jrbriggs added the platform: macOS label Feb 7, 2019

pmj mentioned this issue Mar 1, 2019

Add FileFlaggedInRoot Tests pmj/VFSForGit#1

Open

jrbriggs added the priority: early-adopters Issues blocking macOS EA. label Apr 2, 2019

jrbriggs modified the milestones: macOS Early Adopters, M151 Apr 2, 2019

jrbriggs modified the milestones: M151, M152 Apr 22, 2019

jrbriggs modified the milestones: M152, M153 May 22, 2019

jrbriggs removed this from the M153 milestone Jun 5, 2019

wilbaker added the pri2 label Jul 26, 2019

wilbaker removed the pri2 label Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mac: sporadic vnode I/O errors #328

Mac: sporadic vnode I/O errors #328

pmj commented Oct 3, 2018

derrickstolee commented Oct 3, 2018

sanoursa commented Oct 3, 2018

derrickstolee commented Oct 3, 2018

nickgra commented Oct 4, 2018 •

edited

Loading

nickgra commented Oct 6, 2018 •

edited

Loading

pmj commented Oct 25, 2018 •

edited

Loading

pmj commented Nov 28, 2018 •

edited

Loading

pmj commented Nov 28, 2018

wilbaker commented Nov 29, 2018

pmj commented Nov 29, 2018 •

edited

Loading

pmj commented Nov 29, 2018

wilbaker commented Nov 29, 2018

wilbaker commented Nov 29, 2018

pmj commented Nov 29, 2018 •

edited

Loading

pmj commented Dec 4, 2018

pmj commented Dec 18, 2018

nickgra commented Jan 10, 2019

pmj commented Jan 11, 2019 •

edited

Loading

Mac: sporadic vnode I/O errors #328

Mac: sporadic vnode I/O errors #328

Comments

pmj commented Oct 3, 2018

derrickstolee commented Oct 3, 2018

sanoursa commented Oct 3, 2018

derrickstolee commented Oct 3, 2018

nickgra commented Oct 4, 2018 • edited Loading

nickgra commented Oct 6, 2018 • edited Loading

pmj commented Oct 25, 2018 • edited Loading

pmj commented Nov 28, 2018 • edited Loading

pmj commented Nov 28, 2018

wilbaker commented Nov 29, 2018

pmj commented Nov 29, 2018 • edited Loading

pmj commented Nov 29, 2018

wilbaker commented Nov 29, 2018

wilbaker commented Nov 29, 2018

pmj commented Nov 29, 2018 • edited Loading

pmj commented Dec 4, 2018

pmj commented Dec 18, 2018

nickgra commented Jan 10, 2019

pmj commented Jan 11, 2019 • edited Loading

nickgra commented Oct 4, 2018 •

edited

Loading

nickgra commented Oct 6, 2018 •

edited

Loading

pmj commented Oct 25, 2018 •

edited

Loading

pmj commented Nov 28, 2018 •

edited

Loading

pmj commented Nov 29, 2018 •

edited

Loading

pmj commented Nov 29, 2018 •

edited

Loading

pmj commented Jan 11, 2019 •

edited

Loading