Mac kext: Virtualisation root refactoring & array resizing #322

pmj · 2018-10-02T16:29:54Z

The way the VirtualizationRoot struct has been used in the ProjFS kext has been rather messy so far - sometimes by index, sometimes by pointer, often non-threadsafe (although the risks have been fairly small).

The ultimate goals of this PR are:

Lift the limit on number of virtualisation roots.
Fix the thread safety issues.

The reason I made the virtualisation root array a fixed size initially was because allowing resizes means the pointers will become invalid unless you introduce some fairly strict locking requirements. Locking the root array for near enough the entire duration of a vnode/fileop callback could cause some significant contention though and would furthermore raise issues with performing vnode I/O and acquiring other locks while holding the vroot lock.

This PR approaches the issue a different way: consistently only expose vroot indices in the API, never pointers to the struct itself, and provide accessor functions which ensure thread safety as needed. As long as indices stay linked to root identities, it's safe to drop the lock while holding a root index.

To get there, I made a bunch of other smaller preparatory and in some cases only tangentially related changes; all the changes are in logically distinct git commits, so when reviewing you may wish to review one commit at a time - you'll have an easier ride that way.

Quick commit summary:

Don't assert/KP on attribute reading failure. The assertion on getting vnode attributes successfully failed a few times during development & testing. It's not entirely clear why yet, but the assert doesn't give much clue as to the reason, so I've changed this to a soft failure with some extra logging. If you encounter instances of this error while testing etc., please do send me --reports.
Handle named streams. Relatedly, I've logged I/O errors on a bunch of other operations, and quite a number of them originated in vnodes corresponding to resource forks. (You'll recognise these by their path, which looks something like /path/to/a/regular/file/..namedfork/rsrc.) These aren't standalone files, but alternate data streams/forks on a regular file. For the most part, they are uninteresting to VFS for Git, but they behave oddly when querying things like xattrs, and we really should be applying the logic according to their underlying file. So this changes the vnode op callback handler to operate on that underlying file when encountering a named stream.
Replaces some C-style casts with C++ style. We've been fairly consistent with using C++ style casts (static_cast, const_cast, reinterpret_cast) in the kext, but I stumbled across some C-style casts that crept in. This changes them to C++ style.
Enables warnings for shadowed variables. Activates another compiler warning - shadowing variables is rarely what we want, and more frequently is a bug. (I made just such a mistake while refactoring the VirtualizationRoot API and spent half an hour chasing it, if you must know.)
Adds typed array memory allocation function. This adds a helper function that wraps our memory allocator with a typed array-based one, which make for slightly more readable and less error prone code. (Particularly with regard to overflow on size calculations.)
Use only VirtualizationRoot indices, not direct pointers. This rejigs the VirtualizationRoot.hpp API to never leak pointers, and indeed moves the struct definition to the .cpp file. It also adds accessor functions for situations where previously pointers were dereferenced, and wraps them with locks so access is threadsafe.
Dynamically alloc virtualisation root array & resize when full. This finally gets rid of the nasty assertion failure when our virtualisation root array is full and grows it instead!

pmj · 2018-10-02T16:55:23Z

I've been undecided whether to submit 1-5 and 6-7 as 2 separate PRs, so if review finds that 1-5 are uncontroversial while 6&7 need more work, I'd like to merge that first lot on its own rather than having to keep rebasing on master.

Not addressed by this PR, to be handled in a future change:

GC of dead roots. (e.g. if the user unmounts a root and deletes it with rm -r) We should recycle the array slot in this case, which is currently not done.
Linear searches on the root array. Searching by vnode or inode should be done via a hash table or similarly efficient data structure.

nickgra · 2018-10-02T18:46:35Z

Don't assert/KP on attribute reading failure. The assertion on getting vnode attributes successfully failed a few times during development & testing. It's not entirely clear why yet, but the assert doesn't give much clue as to the reason, so I've changed this to a soft failure with some extra logging. If you encounter instances of this error while testing etc., please do send me reports.

I've been encountering this all the time locally, I suspect the problem is that we're trying to operate on a vnode type that doesn't support us reading attributes.
Since we're handling every fileop close event, we catch some events like this. Pulling ShouldIgnoreVnodeType out from ShouldHandleFileOpEvent before FileIsFlaggedAsInRoot (the offending callsite according to the panic report) fixes the panics for me, but seems to have a larger performance hit than I would like.

@@ -462,10 +462,16 @@ static int HandleFileOpOperation(
         if (vnode_isdir(currentVnode))
         {
             goto CleanupAndReturn;
         }
 
+        vtype vnodeType = vnode_vtype(currentVnode);
+        if (ShouldIgnoreVnodeType(vnodeType, currentVnode))
+        {
+            goto CleanupAndReturn;
+        }
+        
         bool fileFlaggedInRoot = FileIsFlaggedAsInRoot(currentVnode, context);
         if (fileFlaggedInRoot && KAUTH_FILEOP_CLOSE_MODIFIED != closeFlags)
         {
             goto CleanupAndReturn;
         }

I haven't done any formal perf testing, but your solution seems like it'll be much cheaper if this actually fixes the panic. I'm happy to take your fix if you think it's safe with this context :)

nickgra

All of these commits make sense to me, I think it's reasonable to take all of them together instead of chunking 6 and 7 off.

nickgra · 2018-10-02T18:48:43Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

-    *root = VirtualizationRoots_FindForVnode(vnode);
+    *rootIndex = VirtualizationRoot_FindForVnode(vnode);
+
+    // This VNode is in the temp directory, do not act on it.


Is this leftover from when you were looking into the atomic placeholder/temp directory work?

Yeah, I was unsure whether to remove it - it's also in your atomic enumeration branch. I definitely wanted to pull in the enum with the "special" root indices and finally get rid of the -1s everywhere. It seemed silly to remove the temp directory enum value, and we're presumably going to need this condition here, even if it's not going to kick in in this version of the code. I can remove it though.

That makes sense, we can leave it for now.

sanoursa

I haven't done a thorough review of VirtualizationRoots.cpp yet, but all the other changes make sense. My biggest feedback so far is to keep the knowledge of "indexes" private to the VirtualizationRoots code, and make it opaque to all callers.

I'll make a second pass this afternoon to go over the meat of the changes.

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.hpp

sanoursa · 2018-10-02T19:11:49Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

+    // occurring on the file itself. (/path/to/file/..namedfork/rsrc)
+    if (vnode_isnamedstream(currentVnode))
+    {
+        vnode_t mainFileFork = vnode_getparent(currentVnode);


What should we do if this call fails and returns NULLVP? We now know that currentVnode is referring to a named stream and that we failed to resolve to the actual file, so should we give up?

Related: is there any way to guarantee that we can find the parent? The vnode_getparent function is racy since it only resolves vnodes that are still in the name cache. Is there an alternative?

I haven't had a chance to investigate the vnode_parent issues in depth yet, so for now I can't really answer that question. In the named stream case, I don't see how the main fork vnode could possibly disappear, but at the same time I have no evidence that we're guaranteed to be able to grab it.

I'm happy to leave the bigger investigation for a separate change, but I think we should either assert or return a failure here if this call fails. Otherwise we know we will misbehave downstream of this code.

OK, an assertion sounds the most sensible approach here until we get a chance to look into it. If the assert starts firing, we'll know it's a pressing issue.

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

pmj · 2018-10-03T11:19:10Z

Pulling ShouldIgnoreVnodeType out from ShouldHandleFileOpEvent before FileIsFlaggedAsInRoot (the offending callsite according to the panic report) fixes the panics for me, but seems to have a larger performance hit than I would like.

Hmm. vnode_vtype is just a pointer dereferencing/struct member read, and the switch block in ShouldIgnoreVnodeType() ought to compile down to something minimal, so I'd be extremely surprised if there was any significant overhead introduced by that extra check. I feel like we're getting sidetracked by the vnode I/O failure issue - I'll open a github issue to track the various different instances of the problem, because there do seem to be a few pieces to it.

pmj · 2018-10-03T14:09:23Z

I've just pushed an update that fixes the formatting issues, and which changes the terminology to "handles" outside of VirtualizationRoots.cpp, including a typedef as suggested. I've kept the index terminology inside the .cpp for now as they are actually known to be indices there.

pmj · 2018-10-03T14:39:31Z

vnode I/O errors now tracked as issue #328.

wilbaker · 2018-10-03T16:16:07Z

@pmj I'm reviewing these changes now, please hold off on merging until I've posted comments. Thanks!

wilbaker

Overall changes look good to me, left some minor comments\questions

wilbaker · 2018-10-03T16:09:40Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.hpp


-int16_t VirtualizationRoots_LookupVnode(vnode_t vnode, vfs_context_t context);
+int16_t VirtualizationRoots_FindRootAtVnode(vnode_t vnode, vfs_context_t context);


Return VirtualizationRootHandle instead of int16_t

Can the declaration of this function be moved to VirtualizationRoots.cpp? I don't think we have a use case for exposing this right now.

Thinking about this some more, in addition to making it a static function, I'm renaming it to FindOrDetectRootAtVnode to distinguish it from FindRootAtVnode_Locked. The latter just checks against known roots, while this function subsequently checks for xattrs to see if it's a root that's previously not been detected yet.

wilbaker · 2018-10-03T16:20:24Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

@@ -63,7 +61,7 @@ static bool ShouldHandleVnodeOpEvent(
    kauth_action_t action,

    // Out params:
-    VirtualizationRoot** root,
+    int16_t* root,


Change int16_t* to VirtualizationRootHandle*

wilbaker · 2018-10-03T17:17:26Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

+        KextLog_FileError(vn, "ReadVNodeFileFlags failed with error %d (0x%x)", err, err);
+        return 0;
+    }
+
    assert(0 == err);


Should this assert be removed with the 0 != err check that's been added above?

wilbaker · 2018-10-03T17:18:38Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

@@ -803,6 +821,12 @@ static uint32_t ReadVNodeFileFlags(vnode_t vn, vfs_context_t context)
    struct vnode_attr attributes = {};
    errno_t err = GetVNodeAttributes(vn, context, &attributes);
    // TODO: May fail on some file system types? Perhaps we should early-out depending on mount point anyway.
+    if (0 != err)
+    {
+        KextLog_FileError(vn, "ReadVNodeFileFlags failed with error %d (0x%x)", err, err);


Although the kernel panics are annoying, they do make it obvious that we've hit this issue.

With this change will we determine that we've hit this issue when we see failures or unexpected behavior (should we wait on this until we have something in place to log these errors to a file)?

CC @sanoursa for his thoughts as well

Yea I don't think we can just log and ignore. We need to either assert, or else change this function signature so we can return an error and fail the IO.

OK, I'm going to cull that patch from this PR. It's tangential to the core functionality, and too important to piggy-back on it.

wilbaker · 2018-10-03T17:25:57Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

@@ -37,9 +93,16 @@ kern_return_t VirtualizationRoots_Init()
        return KERN_FAILURE;
    }

-    for (uint32_t i = 0; i < MaxVirtualizationRoots; ++i)
+    s_maxVirtualizationRoots = 128;
+    s_virtualizationRoots = Memory_AllocArray<VirtualizationRoot>(s_maxVirtualizationRoots);


Should we be holding s_rwLock here?

Although extremely unlikely, without the lock I'm not sure that we can guarantee that a read on different thread will see the values that we've set here.

There can't be any other threads accessing the data structures at this point yet, so a full lock is technically not needed, although you're right that other threads will need a memory barrier to guarantee propagation. How do you feel about an explicit atomic_thread_fence(memory_order_seq_cst); at the end of this init function?

How do you feel about an explicit atomic_thread_fence(memory_order_seq_cst); at the end of this init function?

That sounds good to me, thanks!

wilbaker · 2018-10-03T17:58:39Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp


-static VirtualizationRoot s_virtualizationRoots[MaxVirtualizationRoots] = {};
+bool VirtualizationRoot_PIDMatchesProvider(VirtualizationRootHandle rootIndex, pid_t pid)


Minor naming suggestion: rename rootIndex to rootHandle

wilbaker · 2018-10-03T18:03:32Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp


    vnode_get(vnode);
    // Search up the tree until we hit a known virtualization root or THE root of the file system
-    while (nullptr == root && NULLVP != vnode && !vnode_isvroot(vnode))
+    while (rootIndex == RootHandle_None && NULLVP != vnode && !vnode_isvroot(vnode))


@sanoursa, we have a mix right now with our checks, sometimes it's VALUE == var and othertimes it's var == VALUE. Did we decide on preferring one over the other, and what is the rule for !=?

(No need to clean this up in this PR, I'm just wondering what our style guideline is for this)

I definitely prefer yoda-speak in C/C++ code to eliminate any risk of accidental assignments, but we have not enforced it consistently.

wilbaker · 2018-10-03T18:04:59Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

-        int16_t rootIndex = VirtualizationRoots_LookupVnode(vnode, nullptr);
-        if (rootIndex >= 0)
+        rootIndex = VirtualizationRoots_FindRootAtVnode(vnode, nullptr);
+        if (rootIndex != RootHandle_None)


Would it be safer to check !VirtualizationRoot_IsValidRootHandle(rootIndex)?

(Or is part of the contract with VirtualizationRoots_FindRootAtVnode that it will only return RootHandle_None or a valid index?)

VirtualizationRoots_FindRootAtVnode() (via FindRootAtVnode_Locked()) will eventually be able to return RootHandle_ProviderTemporaryDirectory, in which case we've actually found what we're looking for, want to stop the search, and return that special handle from this function. This is evidently not obvious, so I'll put in a comment to that effect.

wilbaker · 2018-10-03T18:07:41Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp


-                if (rootIndex < 0)
+                if (rootIndex == RootHandle_None)


Similar comment\question on checking None vs < 0 here.

When should we be checking against None and when should be be checking VirtualizationRoot_IsValidRootHandle?

Yeah, same answer as above - a result of RootHandle_ProviderTemporaryDirectory means we don't need to drop to file I/O level as we've found what we're looking for.

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

sanoursa

I have not had a test to thoroughly test this yet, but the code changes look good to me.

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

sanoursa · 2018-10-04T23:08:33Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

+    s_virtualizationRoots = Memory_AllocArray<VirtualizationRoot>(s_maxVirtualizationRoots);
+    if (nullptr == s_virtualizationRoots)
+    {
+        return KERN_RESOURCE_SHORTAGE;


Everywhere else we return KERN_FAILURE if we fail to allocate memory. Should we update those to return this error code instead?

Given that the return value is only checked in KauthHandler_Init() as a boolean, it doesn't really matter much I suppose. If I was designing it from scratch, I'd probably either pass the full return value back out from the kext start function, or simply return booleans from the various init functions. I don't have sufficiently strong feelings either way to actively go in and change it; I can also change this to KERN_FAILURE if preferred. (If anything, the part that I dislike the most is that we're implicitly testing for KERN_SUCCESS by assuming it's 0, as opposed to e.g. if (KERN_SUCCESS == VirtualizationRoots_Init()))

sanoursa · 2018-10-04T23:09:20Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp


    vnode_get(vnode);
    // Search up the tree until we hit a known virtualization root or THE root of the file system
-    while (nullptr == root && NULLVP != vnode && !vnode_isvroot(vnode))
+    while (rootIndex == RootHandle_None && NULLVP != vnode && !vnode_isvroot(vnode))


I definitely prefer yoda-speak in C/C++ code to eliminate any risk of accidental assignments, but we have not enforced it consistently.

nickgra · 2018-10-05T06:24:57Z

@pmj I ran a few large repo build tests earlier today on this branch and nothing unexpected broke. If you rebase this/merge it in, I'm happy to revalidate that this change doesn't regress anything.

Occasionally, the vnode auth handler is called on a vnode representing a named stream/named fork. (Typically, the resource fork.) Forks don’t have their own xattrs, etc. so some of the logic we apply fails. We don’t care specifically about named forks, so treat any access to them as an access to the main file instead. Note: we need to balance the vnode_getparent() call with vnode_put()

Shadowing variables frequently is a source of bugs, so this enables the corresponding compiler warning.

This change moves the VirtualizationRoot structure definition out of the header file, and unifies the API to only use integer handles, which internally are array indices for non-negative values, and negative values from an enum of special cases. Where indices were previously used outside the VirtualizationRoot implementation functions, the terminology has been changed to "handles." An immediate advantage of the move away from pointers is an improvement in the ability to control thread safety of the root structures; additionally, unlike pointers, handles remain valid outside of held locks, so in a follow-on change, we can reallocate the array for resizing.

…ull.

pmj requested review from sanoursa and nickgra October 2, 2018 16:29

nickgra approved these changes Oct 2, 2018

View reviewed changes

sanoursa reviewed Oct 2, 2018

View reviewed changes

pmj force-pushed the virt-root-refactor branch 2 times, most recently from 9726b6c to 867f547 Compare October 3, 2018 13:56

pmj mentioned this pull request Oct 3, 2018

Mac: sporadic vnode I/O errors #328

Open

wilbaker self-requested a review October 3, 2018 16:05

wilbaker reviewed Oct 3, 2018

View reviewed changes

sanoursa approved these changes Oct 4, 2018

View reviewed changes

pmj added 6 commits October 5, 2018 15:17

Mac kext: Replaces some C-style casts with C++ style.

f147f8c

Mac kext: Enables warning for shadowed variables.

d21d165

Shadowing variables frequently is a source of bugs, so this enables the corresponding compiler warning.

Mac kext: Adds typed array memory allocation function.

e50e488

Mac kext: Dynamically alloc virtualisation root array & resize when f…

0a0ac88

…ull.

pmj force-pushed the virt-root-refactor branch from 867f547 to 0a0ac88 Compare October 5, 2018 14:15

pmj merged commit 5747bf8 into microsoft:master Oct 6, 2018

pmj deleted the virt-root-refactor branch October 6, 2018 07:33

jrbriggs mentioned this pull request Oct 12, 2018

Merge milestones/M142 to releases/shipped #381

Merged


		int16_t VirtualizationRoots_LookupVnode(vnode_t vnode, vfs_context_t context);
		int16_t VirtualizationRoots_FindRootAtVnode(vnode_t vnode, vfs_context_t context);


		static VirtualizationRoot s_virtualizationRoots[MaxVirtualizationRoots] = {};
		bool VirtualizationRoot_PIDMatchesProvider(VirtualizationRootHandle rootIndex, pid_t pid)

Mac kext: Virtualisation root refactoring & array resizing #322

Mac kext: Virtualisation root refactoring & array resizing #322

Conversation

pmj commented Oct 2, 2018 • edited Loading

pmj commented Oct 2, 2018

nickgra commented Oct 2, 2018

nickgra left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanoursa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

pmj commented Oct 3, 2018

pmj commented Oct 3, 2018

pmj commented Oct 3, 2018

wilbaker commented Oct 3, 2018

wilbaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanoursa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickgra commented Oct 5, 2018

pmj commented Oct 2, 2018 •

edited

Loading

pmj Oct 5, 2018 •

edited

Loading

pmj Oct 5, 2018 •

edited

Loading

pmj Oct 5, 2018 •

edited

Loading