Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac kext: Virtualisation root refactoring & array resizing #322

Merged
merged 6 commits into from
Oct 6, 2018

Conversation

pmj
Copy link
Contributor

@pmj pmj commented Oct 2, 2018

The way the VirtualizationRoot struct has been used in the ProjFS kext has been rather messy so far - sometimes by index, sometimes by pointer, often non-threadsafe (although the risks have been fairly small).

The ultimate goals of this PR are:

  • Lift the limit on number of virtualisation roots.
  • Fix the thread safety issues.

The reason I made the virtualisation root array a fixed size initially was because allowing resizes means the pointers will become invalid unless you introduce some fairly strict locking requirements. Locking the root array for near enough the entire duration of a vnode/fileop callback could cause some significant contention though and would furthermore raise issues with performing vnode I/O and acquiring other locks while holding the vroot lock.

This PR approaches the issue a different way: consistently only expose vroot indices in the API, never pointers to the struct itself, and provide accessor functions which ensure thread safety as needed. As long as indices stay linked to root identities, it's safe to drop the lock while holding a root index.

To get there, I made a bunch of other smaller preparatory and in some cases only tangentially related changes; all the changes are in logically distinct git commits, so when reviewing you may wish to review one commit at a time - you'll have an easier ride that way.

Quick commit summary:

  1. Don't assert/KP on attribute reading failure. The assertion on getting vnode attributes successfully failed a few times during development & testing. It's not entirely clear why yet, but the assert doesn't give much clue as to the reason, so I've changed this to a soft failure with some extra logging. If you encounter instances of this error while testing etc., please do send me --reports.
  2. Handle named streams. Relatedly, I've logged I/O errors on a bunch of other operations, and quite a number of them originated in vnodes corresponding to resource forks. (You'll recognise these by their path, which looks something like /path/to/a/regular/file/..namedfork/rsrc.) These aren't standalone files, but alternate data streams/forks on a regular file. For the most part, they are uninteresting to VFS for Git, but they behave oddly when querying things like xattrs, and we really should be applying the logic according to their underlying file. So this changes the vnode op callback handler to operate on that underlying file when encountering a named stream.
  3. Replaces some C-style casts with C++ style. We've been fairly consistent with using C++ style casts (static_cast, const_cast, reinterpret_cast) in the kext, but I stumbled across some C-style casts that crept in. This changes them to C++ style.
  4. Enables warnings for shadowed variables. Activates another compiler warning - shadowing variables is rarely what we want, and more frequently is a bug. (I made just such a mistake while refactoring the VirtualizationRoot API and spent half an hour chasing it, if you must know.)
  5. Adds typed array memory allocation function. This adds a helper function that wraps our memory allocator with a typed array-based one, which make for slightly more readable and less error prone code. (Particularly with regard to overflow on size calculations.)
  6. Use only VirtualizationRoot indices, not direct pointers. This rejigs the VirtualizationRoot.hpp API to never leak pointers, and indeed moves the struct definition to the .cpp file. It also adds accessor functions for situations where previously pointers were dereferenced, and wraps them with locks so access is threadsafe.
  7. Dynamically alloc virtualisation root array & resize when full. This finally gets rid of the nasty assertion failure when our virtualisation root array is full and grows it instead!

@pmj pmj requested review from sanoursa and nickgra October 2, 2018 16:29
@pmj
Copy link
Contributor Author

pmj commented Oct 2, 2018

I've been undecided whether to submit 1-5 and 6-7 as 2 separate PRs, so if review finds that 1-5 are uncontroversial while 6&7 need more work, I'd like to merge that first lot on its own rather than having to keep rebasing on master.

Not addressed by this PR, to be handled in a future change:

  • GC of dead roots. (e.g. if the user unmounts a root and deletes it with rm -r) We should recycle the array slot in this case, which is currently not done.
  • Linear searches on the root array. Searching by vnode or inode should be done via a hash table or similarly efficient data structure.

@nickgra
Copy link
Member

nickgra commented Oct 2, 2018

Don't assert/KP on attribute reading failure. The assertion on getting vnode attributes successfully failed a few times during development & testing. It's not entirely clear why yet, but the assert doesn't give much clue as to the reason, so I've changed this to a soft failure with some extra logging. If you encounter instances of this error while testing etc., please do send me reports.

I've been encountering this all the time locally, I suspect the problem is that we're trying to operate on a vnode type that doesn't support us reading attributes.
Since we're handling every fileop close event, we catch some events like this. Pulling ShouldIgnoreVnodeType out from ShouldHandleFileOpEvent before FileIsFlaggedAsInRoot (the offending callsite according to the panic report) fixes the panics for me, but seems to have a larger performance hit than I would like.

@@ -462,10 +462,16 @@ static int HandleFileOpOperation(
         if (vnode_isdir(currentVnode))
         {
             goto CleanupAndReturn;
         }
 
+        vtype vnodeType = vnode_vtype(currentVnode);
+        if (ShouldIgnoreVnodeType(vnodeType, currentVnode))
+        {
+            goto CleanupAndReturn;
+        }
+        
         bool fileFlaggedInRoot = FileIsFlaggedAsInRoot(currentVnode, context);
         if (fileFlaggedInRoot && KAUTH_FILEOP_CLOSE_MODIFIED != closeFlags)
         {
             goto CleanupAndReturn;
         }

I haven't done any formal perf testing, but your solution seems like it'll be much cheaper if this actually fixes the panic. I'm happy to take your fix if you think it's safe with this context :)

Copy link
Member

@nickgra nickgra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these commits make sense to me, I think it's reasonable to take all of them together instead of chunking 6 and 7 off.

*root = VirtualizationRoots_FindForVnode(vnode);
*rootIndex = VirtualizationRoot_FindForVnode(vnode);

// This VNode is in the temp directory, do not act on it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this leftover from when you were looking into the atomic placeholder/temp directory work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was unsure whether to remove it - it's also in your atomic enumeration branch. I definitely wanted to pull in the enum with the "special" root indices and finally get rid of the -1s everywhere. It seemed silly to remove the temp directory enum value, and we're presumably going to need this condition here, even if it's not going to kick in in this version of the code. I can remove it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, we can leave it for now.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done a thorough review of VirtualizationRoots.cpp yet, but all the other changes make sense. My biggest feedback so far is to keep the knowledge of "indexes" private to the VirtualizationRoots code, and make it opaque to all callers.

I'll make a second pass this afternoon to go over the meat of the changes.

// occurring on the file itself. (/path/to/file/..namedfork/rsrc)
if (vnode_isnamedstream(currentVnode))
{
vnode_t mainFileFork = vnode_getparent(currentVnode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do if this call fails and returns NULLVP? We now know that currentVnode is referring to a named stream and that we failed to resolve to the actual file, so should we give up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related: is there any way to guarantee that we can find the parent? The vnode_getparent function is racy since it only resolves vnodes that are still in the name cache. Is there an alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to investigate the vnode_parent issues in depth yet, so for now I can't really answer that question. In the named stream case, I don't see how the main fork vnode could possibly disappear, but at the same time I have no evidence that we're guaranteed to be able to grab it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to leave the bigger investigation for a separate change, but I think we should either assert or return a failure here if this call fails. Otherwise we know we will misbehave downstream of this code.

Copy link
Contributor Author

@pmj pmj Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, an assertion sounds the most sensible approach here until we get a chance to look into it. If the assert starts firing, we'll know it's a pressing issue.

@pmj
Copy link
Contributor Author

pmj commented Oct 3, 2018

Pulling ShouldIgnoreVnodeType out from ShouldHandleFileOpEvent before FileIsFlaggedAsInRoot (the offending callsite according to the panic report) fixes the panics for me, but seems to have a larger performance hit than I would like.

Hmm. vnode_vtype is just a pointer dereferencing/struct member read, and the switch block in ShouldIgnoreVnodeType() ought to compile down to something minimal, so I'd be extremely surprised if there was any significant overhead introduced by that extra check. I feel like we're getting sidetracked by the vnode I/O failure issue - I'll open a github issue to track the various different instances of the problem, because there do seem to be a few pieces to it.

@pmj pmj force-pushed the virt-root-refactor branch 2 times, most recently from 9726b6c to 867f547 Compare October 3, 2018 13:56
@pmj
Copy link
Contributor Author

pmj commented Oct 3, 2018

I've just pushed an update that fixes the formatting issues, and which changes the terminology to "handles" outside of VirtualizationRoots.cpp, including a typedef as suggested. I've kept the index terminology inside the .cpp for now as they are actually known to be indices there.

@pmj
Copy link
Contributor Author

pmj commented Oct 3, 2018

vnode I/O errors now tracked as issue #328.

@wilbaker wilbaker self-requested a review October 3, 2018 16:05
@wilbaker
Copy link
Member

wilbaker commented Oct 3, 2018

@pmj I'm reviewing these changes now, please hold off on merging until I've posted comments. Thanks!

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes look good to me, left some minor comments\questions


int16_t VirtualizationRoots_LookupVnode(vnode_t vnode, vfs_context_t context);
int16_t VirtualizationRoots_FindRootAtVnode(vnode_t vnode, vfs_context_t context);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Return VirtualizationRootHandle instead of int16_t
  • Can the declaration of this function be moved to VirtualizationRoots.cpp? I don't think we have a use case for exposing this right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, in addition to making it a static function, I'm renaming it to FindOrDetectRootAtVnode to distinguish it from FindRootAtVnode_Locked. The latter just checks against known roots, while this function subsequently checks for xattrs to see if it's a root that's previously not been detected yet.

@@ -63,7 +61,7 @@ static bool ShouldHandleVnodeOpEvent(
kauth_action_t action,

// Out params:
VirtualizationRoot** root,
int16_t* root,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change int16_t* to VirtualizationRootHandle*

KextLog_FileError(vn, "ReadVNodeFileFlags failed with error %d (0x%x)", err, err);
return 0;
}

assert(0 == err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this assert be removed with the 0 != err check that's been added above?

@@ -803,6 +821,12 @@ static uint32_t ReadVNodeFileFlags(vnode_t vn, vfs_context_t context)
struct vnode_attr attributes = {};
errno_t err = GetVNodeAttributes(vn, context, &attributes);
// TODO: May fail on some file system types? Perhaps we should early-out depending on mount point anyway.
if (0 != err)
{
KextLog_FileError(vn, "ReadVNodeFileFlags failed with error %d (0x%x)", err, err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the kernel panics are annoying, they do make it obvious that we've hit this issue.

With this change will we determine that we've hit this issue when we see failures or unexpected behavior (should we wait on this until we have something in place to log these errors to a file)?

CC @sanoursa for his thoughts as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I don't think we can just log and ignore. We need to either assert, or else change this function signature so we can return an error and fail the IO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm going to cull that patch from this PR. It's tangential to the core functionality, and too important to piggy-back on it.

@@ -37,9 +93,16 @@ kern_return_t VirtualizationRoots_Init()
return KERN_FAILURE;
}

for (uint32_t i = 0; i < MaxVirtualizationRoots; ++i)
s_maxVirtualizationRoots = 128;
s_virtualizationRoots = Memory_AllocArray<VirtualizationRoot>(s_maxVirtualizationRoots);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be holding s_rwLock here?

Although extremely unlikely, without the lock I'm not sure that we can guarantee that a read on different thread will see the values that we've set here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can't be any other threads accessing the data structures at this point yet, so a full lock is technically not needed, although you're right that other threads will need a memory barrier to guarantee propagation. How do you feel about an explicit atomic_thread_fence(memory_order_seq_cst); at the end of this init function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about an explicit atomic_thread_fence(memory_order_seq_cst); at the end of this init function?

That sounds good to me, thanks!


static VirtualizationRoot s_virtualizationRoots[MaxVirtualizationRoots] = {};
bool VirtualizationRoot_PIDMatchesProvider(VirtualizationRootHandle rootIndex, pid_t pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor naming suggestion: rename rootIndex to rootHandle


vnode_get(vnode);
// Search up the tree until we hit a known virtualization root or THE root of the file system
while (nullptr == root && NULLVP != vnode && !vnode_isvroot(vnode))
while (rootIndex == RootHandle_None && NULLVP != vnode && !vnode_isvroot(vnode))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanoursa, we have a mix right now with our checks, sometimes it's VALUE == var and othertimes it's var == VALUE. Did we decide on preferring one over the other, and what is the rule for !=?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No need to clean this up in this PR, I'm just wondering what our style guideline is for this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely prefer yoda-speak in C/C++ code to eliminate any risk of accidental assignments, but we have not enforced it consistently.

int16_t rootIndex = VirtualizationRoots_LookupVnode(vnode, nullptr);
if (rootIndex >= 0)
rootIndex = VirtualizationRoots_FindRootAtVnode(vnode, nullptr);
if (rootIndex != RootHandle_None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safer to check !VirtualizationRoot_IsValidRootHandle(rootIndex)?

(Or is part of the contract with VirtualizationRoots_FindRootAtVnode that it will only return RootHandle_None or a valid index?)

Copy link
Contributor Author

@pmj pmj Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VirtualizationRoots_FindRootAtVnode() (via FindRootAtVnode_Locked()) will eventually be able to return RootHandle_ProviderTemporaryDirectory, in which case we've actually found what we're looking for, want to stop the search, and return that special handle from this function. This is evidently not obvious, so I'll put in a comment to that effect.


if (rootIndex < 0)
if (rootIndex == RootHandle_None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment\question on checking None vs < 0 here.

When should we be checking against None and when should be be checking VirtualizationRoot_IsValidRootHandle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, same answer as above - a result of RootHandle_ProviderTemporaryDirectory means we don't need to drop to file I/O level as we've found what we're looking for.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not had a test to thoroughly test this yet, but the code changes look good to me.

s_virtualizationRoots = Memory_AllocArray<VirtualizationRoot>(s_maxVirtualizationRoots);
if (nullptr == s_virtualizationRoots)
{
return KERN_RESOURCE_SHORTAGE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else we return KERN_FAILURE if we fail to allocate memory. Should we update those to return this error code instead?

Copy link
Contributor Author

@pmj pmj Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the return value is only checked in KauthHandler_Init() as a boolean, it doesn't really matter much I suppose. If I was designing it from scratch, I'd probably either pass the full return value back out from the kext start function, or simply return booleans from the various init functions. I don't have sufficiently strong feelings either way to actively go in and change it; I can also change this to KERN_FAILURE if preferred. (If anything, the part that I dislike the most is that we're implicitly testing for KERN_SUCCESS by assuming it's 0, as opposed to e.g. if (KERN_SUCCESS == VirtualizationRoots_Init()))


vnode_get(vnode);
// Search up the tree until we hit a known virtualization root or THE root of the file system
while (nullptr == root && NULLVP != vnode && !vnode_isvroot(vnode))
while (rootIndex == RootHandle_None && NULLVP != vnode && !vnode_isvroot(vnode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely prefer yoda-speak in C/C++ code to eliminate any risk of accidental assignments, but we have not enforced it consistently.

@nickgra
Copy link
Member

nickgra commented Oct 5, 2018

@pmj I ran a few large repo build tests earlier today on this branch and nothing unexpected broke. If you rebase this/merge it in, I'm happy to revalidate that this change doesn't regress anything.

pmj added 6 commits October 5, 2018 15:17
Occasionally, the vnode auth handler is called on a vnode representing a named stream/named fork. (Typically, the resource fork.) Forks don’t have their own xattrs, etc. so some of the logic we apply fails. We don’t care specifically about named forks, so treat any access to them as an access to the main file instead.

Note: we need to balance the vnode_getparent() call with vnode_put()
Shadowing variables frequently is a source of bugs, so this enables the corresponding compiler warning.
This change moves the VirtualizationRoot structure definition out of the header file, and unifies the API to only use integer handles, which internally are array indices for non-negative values, and negative values from an enum of special cases. Where indices were previously used outside the VirtualizationRoot implementation functions, the terminology has been changed to "handles."

An immediate advantage of the move away from pointers is an improvement in the ability to control thread safety of the root structures; additionally, unlike pointers, handles remain valid outside of held locks, so in a follow-on change, we can reallocate the array for resizing.
@pmj pmj force-pushed the virt-root-refactor branch from 867f547 to 0a0ac88 Compare October 5, 2018 14:15
@pmj pmj merged commit 5747bf8 into microsoft:master Oct 6, 2018
@pmj pmj deleted the virt-root-refactor branch October 6, 2018 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants