Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symbolic link APIs #54253

Merged
merged 51 commits into from
Jul 9, 2021
Merged

Add symbolic link APIs #54253

merged 51 commits into from
Jul 9, 2021

Conversation

carlossanlop
Copy link
Member

Fixes #24271

Co-authored with @jozkee

PTAL @iSazonov @mklement0 @tmds @carlreinke

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Jun 15, 2021

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #24271

Co-authored with @jozkee

PTAL @iSazonov @mklement0 @tmds @carlreinke

Author: carlossanlop
Assignees: carlossanlop, Jozkee
Labels:

area-System.IO

Milestone: 6.0.0

/// <exception cref="ArgumentNullException"><paramref name="path"/> or <paramref name="pathToTarget"/> is <see langword="null"/>.</exception>
/// <exception cref="ArgumentException"><paramref name="path"/> or <paramref name="pathToTarget"/> is empty.
/// -or-
/// <paramref name="path"/> is not an absolute path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from File.Create, which allows path to be relative. I'm not necessarily opposed to requiring an absolute path, but it seems like it could be a gotcha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation wise we do support relative paths, similarly to any other API in System.IO. This is just a mistake in the documentation. Thanks for pointing this out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I initially wrote the empty APIs with docs on top of them assuming some things, but didn't adjust them after writing the code. Thanks for catching it, @carlreinke. We'll make sure to double check the docs.

/// -or-
/// <paramref name="path"/> is not an absolute path.
/// -or-
/// <paramref name="path"/> or <paramref name="pathToTarget"/> contains invalid path characters.</exception>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping path and pathToTarget here makes it sound like they will both be validated against the same set of invalid characters. pathToTarget allows almost anything, including most of Path.InvalidPathChars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the target is not the regular path?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlreinke, a symlink points to another file-system entry using a regular path, so it seems sensible to validate the target path formally like any other path (not in terms of existence; although, given that symlinks can span file-systems, conceivably they have different sets of valid characters - how does FileSystem.VerifyValidPath handle this currently?). Why do you think pathToTarget allows (should allow) almost anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mklement0 pathToTarget allows almost anything because mklink (and thus the underlying OS API) allows almost anything in the target on Windows.

C:\temp>mklink foo "<>:/\|?*"
symbolic link created for foo <<===>> <>:/\|?*

C:\temp>dir | findstr foo
2021-06-16  23:08    <SYMLINK>      foo [<>:/\|?*]

(On Linux the only invalid path character is '\0', so it's obvious that it will allow almost anything there.)

See this comment for a use case for storing arbitrary data in a symlink.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlreinke:

Wow - I certainly did not expect mklink foo "<>:/\|?*" to work, and you have to wonder why that is permitted. The implementation via reparse points means that arbitrary data can be stored, so one can see how it's technically possible, but to what end? Probably not in anticipation of "off-label" uses such as as a lock on Unix, per the comment you link to.

I guess that disallowing '\0' is then the only check we can perform without making underlying capabilities inaccessible (I haven't tried the APIs directly, but at least mklink / ln don't support '\0'.)

The only caveat is that we must make sure that other .NET APIs then don't run into trouble when they encounter a FileSystemInfo instance with such a misshapen path obtained via .ResolveLinkTarget() - note that
new FileInfo("<>:/\|?*") does not work, for instance.

And, of course, this permissiveness should be clearly documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

De-facto only OS file system driver can do full path validation. In the library it makes sense only to do simple checks for null/\0.

/// <param name="linkPath">The path of the file link.</param>
/// <param name="returnFinalTarget"><see langword="true"/> to follow links to the final target; <see langword="false"/> to return the immediate next link.</param>
/// <returns>A <see cref="FileInfo"/> instance if <paramref name="linkPath"/> exists, independently if the target exists or not. <see langword="null"/> if <paramref name="linkPath"/> does not exist.</returns>
public static System.IO.FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are at least a couple <exception/> cases that should be listed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+        /// <exception cref="ArgumentNullException"><paramref name="linkPath"/> is <see langword="null"/>.</exception>
+        /// <exception cref="ArgumentException"><paramref name="linkPath"/> is empty.
+        /// -or-
+        /// <paramref name="linkPath"/> contains a null character.</exception>

{
// Unix max paths are typically 1K or 4K UTF-8 bytes, 256 should handle the majority of paths
// without putting too much pressure on the stack.
internal const int DefaultPathBufferSize = 256;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the team has taken any measurements on this? What is typical path size?
I would choose half of the maximum i.e. 512 so as to always have no more than one fallback (bufferSize *= 2).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't know if we have data around this. Since this number is used in some other APIs (and this const is now consumed in those places) I would rather not make that change now. Feel free to open an issue though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This will give us a new const for Unix, while we already have Interop.Kernel32.MAX_PATH for Windows. It would be nice to have one const for both platforms.

Comment on lines 23 to 19
/// Allows creation of symbolic links when the process is not elevated. Starting with Windows 10 Insiders build 14972.
/// Developer Mode must first be enabled on the machine before this option will function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 16 to 33
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx
// We don't need all the struct fields; omitting the rest.
[StructLayout(LayoutKind.Sequential)]
public unsafe struct REPARSE_DATA_BUFFER
{
public uint ReparseTag;
public ushort ReparseDataLength;
public ushort Reserved;
public SymbolicLinkReparseBuffer ReparseBufferSymbolicLink;

[StructLayout(LayoutKind.Sequential)]
public struct SymbolicLinkReparseBuffer
{
public ushort SubstituteNameOffset;
public ushort SubstituteNameLength;
public ushort PrintNameOffset;
public ushort PrintNameLength;
public uint Flags;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since C# doesn't support C++ UNION I think it is better to define flat struct with REPARSE_DATA_BUFFER_SYMBOLICLINK name.

Suggested change
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx
// We don't need all the struct fields; omitting the rest.
[StructLayout(LayoutKind.Sequential)]
public unsafe struct REPARSE_DATA_BUFFER
{
public uint ReparseTag;
public ushort ReparseDataLength;
public ushort Reserved;
public SymbolicLinkReparseBuffer ReparseBufferSymbolicLink;
[StructLayout(LayoutKind.Sequential)]
public struct SymbolicLinkReparseBuffer
{
public ushort SubstituteNameOffset;
public ushort SubstituteNameLength;
public ushort PrintNameOffset;
public ushort PrintNameLength;
public uint Flags;
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx
// We don't need all the struct fields; omitting the rest.
[StructLayout(LayoutKind.Sequential)]
public unsafe struct REPARSE_DATA_BUFFER
{
public uint ReparseTag;
public ushort ReparseDataLength;
public ushort Reserved;
public ushort SubstituteNameOffset;
public ushort SubstituteNameLength;
public ushort PrintNameOffset;
public ushort PrintNameLength;
public uint Flags;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested struct was added for clarity and easier manipulation.

Do you have any concerns around having a nested struct, as opposed to having it all flatly defined in the top struct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only concern is that we will have to change this and define some structs if we will want to support other reparse points. (mount points and others). This is that we have in PowerShell code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, the C++ struct containing nested union structs can define the data for a symbolic link, for a mount point and the generic type:

typedef struct _REPARSE_DATA_BUFFER {
  ULONG  ReparseTag;
  USHORT ReparseDataLength;
  USHORT Reserved;
  union {
    struct {
      USHORT SubstituteNameOffset;
      USHORT SubstituteNameLength;
      USHORT PrintNameOffset;
      USHORT PrintNameLength;
      ULONG  Flags;
      WCHAR  PathBuffer[1];
    } SymbolicLinkReparseBuffer;
    struct {
      USHORT SubstituteNameOffset;
      USHORT SubstituteNameLength;
      USHORT PrintNameOffset;
      USHORT PrintNameLength;
      WCHAR  PathBuffer[1];
    } MountPointReparseBuffer;
    struct {
      UCHAR DataBuffer[1];
    } GenericReparseBuffer;
  } DUMMYUNIONNAME;
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;

Correct me if I'm wrong, but I think the code can be preserved as it is right now, and when the time comes to support junctions and others, we add the extra structs as needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we add the extra structs as needed.

I mean we can not use the same name REPARSE_DATA_BUFFER for all the structs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we can not use the same name REPARSE_DATA_BUFFER for all the structs.

Why not? My idea was to use REPARSE_DATA_BUFFER for all the pinvokes related to reparse points. Similar to what we have for symbolic links here:
https://github.com/dotnet/runtime/blob/55c1c67ad9653e82ca5d63f6bc47721321891af8/src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Windows.cs#L503

For mount points we would do rdb.MountPointReparseBuffer.SubstituteNameOffset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same name for different purposes will be misleading. For example, there will be references with the same name in searches. This will make it difficult to understand and research the code.

GetFinalLinkTarget(linkPath, isDirectory) :
GetImmediateLinkTarget(linkPath, isDirectory, throwOnNotFound: true, normalize: true);

return targetPath == null ? null :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return targetPath == null ? null :
return targetPath is null ? null :

Win32Marshal.GetExceptionForLastWin32Error(linkPath);
}

char* buffer = stackalloc char[Interop.Kernel32.MAX_PATH];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be long path?

Copy link
Member Author

@carlossanlop carlossanlop Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with @jozkee. We should verify what happens if we pass a path that's larger than MAX_PATH. if the API can fail and tell us that the buffer was not large enough, then we could apply similar logic to what we added in the Unix side.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iSazonov : It can.

@carlossanlop
Copy link
Member Author

@dotnet/runtime-infrastructure the failed CI legs had a problem with Docker:

Docker pull failed with exit code 1

Value cannot be null. (Parameter 'ContainerId')

I tried restarting them but they failed again. Any clues why this is happening?

@hoyosjs
Copy link
Member

hoyosjs commented Jun 21, 2021

This is only for the linker tests. And it's because the container failed to initialize it looks like, it seems to be unable to download the image, but I can fetch it locally. Seems like a timeout because it's a huge image (3.4 gb) and it timed out during link verification. The other legs are seeing some instability in using AzDO NuGet feeds.

@jozkee jozkee requested a review from adamsitnik July 6, 2021 23:37
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but:

  • System.IO.Tests.Directory_SymbolicLinks.ResolveLinkTarget_ReturnFinalTarget_MaxFollowedLinks test is failing in one of the Windows CI legs. Please fix it before merging
  • If we don't know what should be the default value of bool returnFinalTarget (Add symbolic link APIs #54253 (comment)) IMO we should not provide default value and the users should provide it in explicit way.

@jeffhandley
Copy link
Member

  • If we don't know what should be the default value of bool returnFinalTarget (Add symbolic link APIs #54253 (comment)) IMO we should not provide default value and the users should provide it in explicit way.

That sounds like a good approach to me.

@jozkee
Copy link
Member

jozkee commented Jul 8, 2021

System.IO.Tests.Directory_SymbolicLinks.ResolveLinkTarget_ReturnFinalTarget_MaxFollowedLinks test is failing in one of the Windows CI legs. Please fix it before merging

This was due Windows not being consistent with the reparse point limit specified in the docs. I've loosen the validation so we now check it completes successfully with small chains of links (1, 10, 40) and check that a nice exception is thrown with large ones (100).

If we don't know what should be the default value of bool returnFinalTarget (#54253 (comment)) IMO we should not provide default value and the users should provide it in explicit way.

Done but this is an API change. cc @bartonjs @terrajobst

@terrajobst
Copy link
Member

terrajobst commented Jul 8, 2021

@adamsitnik @jeffhandley

  • If we don't know what should be the default value of bool returnFinalTarget (Add symbolic link APIs #54253 (comment)) IMO we should not provide default value and the users should provide it in explicit way.

That sounds like a good approach to me.

I'm not opposed to removing the default but I generally don't like "we can't agree so we make the user choose"-line of thinking. The reason being that we're the experts and most users won't be.

There are cases where the purpose of the parameter is immediately clear (e.g. bool recursive for Directory.Delete()). I'm not sure that this is the case here, so the result might be a coin flip.

Curious what @bartonjs thinks.

@iSazonov
Copy link
Contributor

iSazonov commented Jul 8, 2021

If we would have public class SymbolicLink it would be clear we process the entity itself and the default should be false. Here we enhance existing API and the default would rather be true as the API already partially does.

@bartonjs
Copy link
Member

bartonjs commented Jul 8, 2021

If we don't know what should be the default value of bool returnFinalTarget ...

I'm not opposed to removing the default but I generally don't like "we can't agree so we make the user choose"-line of thinking. The reason being that we're the experts and most users won't be.
...
Curious what @bartonjs thinks.

My standard starting position is to ask the FDG what they have to say about it:

DO provide good defaults for all properties and parameters (using convenience overloads), if possible.

OK, so if it's possible we should provide a good default. So, what makes a good default in this case? I think "predictability".

Let's imagine that we didn't have the parameter at all... so it's just ResolveTarget(), what would that do? I feel like it would return the target value of the link... so non-recursive/non-final. Now, in a future version, someone says "it'd be nice if I didn't have to write the while loop myself, especially because I keep forgetting to put in an infinite loop detector", so we add an overload. The zero-arg version is still doing a single target resolution, because that's the method name. The parameter then can control an "actually, I want to keep going" behavior.

So now lets just collapse that and look at it with a default parameter. The name of the method is ResolveTarget. Resolve is, admittedly, slightly ambiguous, so let's pretend it's named GetTarget. Well, GetTarget is obviously "give me the target value of this symlink", so link.GetTarget() would be non-recursive. To me, a similar thing applies when we let Get be called Resolve. If we think that's where the "but Resolve feels like it should keep going until final resolution", then perhaps the better change is s/ResolveTarget/GetTarget, and now "once" is the "obvious" behavior, and "keep going" is the opt-in behavior.

The problem with recursive being the default is that it has unexpected behaviors. If we have no work limiter, then mutually referential symlinks cause an infinite loop:

$ ls -al foo bar
lrwxrwxrwx 1 jbarton jbarton 3 Jul  8 09:38 bar -> foo
lrwxrwxrwx 1 jbarton jbarton 3 Jul  8 09:38 foo -> bar

If we do have a work limiter then mutually referential symlinks will.. do something. Throw, return the last one it saw when it exhausted the work factor, something. Whatever it does, you probably didn't expect it. Sure, the docs'll mention it, but no one reads the docs until something goes wrong.

To me, from a logical perspective (of how I think about things), "once" is the correct default. To me, from a security perspective (not "hiding" unexpected behaviors in server products), "once" is the correct default. So, to me, we should just figure out what makes us happy with that choice (e.g. renaming Resolve to Get).

@iSazonov
Copy link
Contributor

iSazonov commented Jul 8, 2021

To me, from a logical perspective (of how I think about things), "once" is the correct default. To me, from a security perspective (not "hiding" unexpected behaviors in server products), "once" is the correct default. So, to me, we should just figure out what makes us happy with that choice (e.g. renaming Resolve to Get).

From a logical perspective wouldn't it make more sense to have `public class Symbolic Link'?
In this case it will be clear to everyone that all actions refer to this entity (symbolic link), while if we add new extensions to the existing API it causes difficulties to understand (and implement) on every step of the way.

@jozkee
Copy link
Member

jozkee commented Jul 8, 2021

@iSazonov it might be but being so close to the release, I don't think its a good idea to try to do another API from scratch. We can evaluate the public class LinkInfo : FileSystemInfo on the next releases. Shipping this in 6 can also give us an insight of how good we did in satisfying the main scenarios, if we did poorly (I hope not), we could address that in said new type.

@adamsitnik adamsitnik merged commit dcce0f5 into dotnet:main Jul 9, 2021
@iSazonov
Copy link
Contributor

iSazonov commented Jul 9, 2021

@jozkee @carlossanlop @adamsitnik Thanks for great work!

@jozkee I fully trust MSFT team experience. Nevertheless I am a bit nervous about link APIs based on experiences we got in the PR from discussions of details. :-) I hope we will continue discussions in proper issue - what is the issue?

@carlossanlop carlossanlop deleted the symlinks branch July 9, 2021 18:33
@jozkee
Copy link
Member

jozkee commented Jul 9, 2021

@iSazonov #51605 is the uber issue, you can comment in there or you can create a sub-issue.

@jeffhandley
Copy link
Member

@iSazonov Thanks for all of your input along the way, and we certainly want and invite more input and feedback. By getting this merged in, we will be able to include these APIs in .NET 6.0 Preview 7, and solicit feedback during RC & RC2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposed API for symbolic links