Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommandLineStringSplitter does not conform to "standard" quote escaping rules #1758

Open
1 of 2 tasks
jonsequitur opened this issue Jun 9, 2022 · 7 comments
Open
1 of 2 tasks
Assignees
Labels
Area-Parser and Binder bug Something isn't working
Milestone

Comments

@jonsequitur
Copy link
Contributor

jonsequitur commented Jun 9, 2022

The CommandLineStringSplitter provides a way to split a string into a string[] and is intended to conform to the way that strings are split on the command line.

It has two primary use cases:

  • Testing. Testing your parser using Parse(string[] args) or Invoke(string[] args) can be error prone because it requires people to make assumptions about how a command line will actually be split when your app is started, and the actual rules are not always intuitive.
  • Completions. CommandLineStringSplitter is critical for completions, since the only way to capture the text on the command line from within the launched .NET application is to pass it in as a complete string from the shim shell script and then split it into the equivalent string[] args array that Main receives.

Various shells will exhibit different behaviors here due to different quote escaping rules, so the user's experience of this splitting logic might not match up with a single implementation of CommandLineStringSplitter. Because of this, and to eventually provide higher fidelity for the completions experience on different shells in the presence of escapes on the command line, it might be useful to provide multiple implementations over time and we should consider anticipating that need in the API design now.

Resources

Related


For the purposes of illustration in examples below, I'll be using the following program to confirm behaviors that differ among various shells:

Console.WriteLine("---- DEFAULT SPLIT            ---");

foreach (var arg in args)
{
    Console.WriteLine(arg);
}

Console.WriteLine("---- Environment.CommandLine ----");

Console.WriteLine(Environment.CommandLine);
@jonsequitur
Copy link
Contributor Author

jonsequitur commented Jun 29, 2022

Copied from @ptr727's comment on #1781:

Problem with the string[] args is that it breaks when using e.g. "D:\", where the " was treated as an embedded quote, not a terminator quote.

Also, the vanilla args[] does not (did not) support unicode or UTF8 characters (critical when piping between processes), thus one was forced to use the W/unicode API to get the original commandline, then parse by hand.

With the introduction of .NET Core, I asked, and the team did not want to change the behavior from how WIN32 in C++ works, even if broken.

The McMaster, gsscoder, and can't remember the other CLI library I used before did not want to include the "correct" parser as I referenced on stackoverflow, thus myself and others just created the manual parsing code, and ignored the broken args[] behavior.
Found one: gsscoder/commandline#473

Now, things may have changed, maybe .NET Core now correctly parses in args[], but that may be something you could verify, I've given up, or more I can't be bothered to try to get a team to fix something I think is/was broken.

And yep, #1758 sorta describes the issues with not correctly parsing.
See: https://github.com/ptr727/Utilities/blob/main/UtilitiesTests/CommandLineTests.cs
Also the stackoverflow article has grown to be very complete.

@iSazonov
Copy link

iSazonov commented Oct 5, 2022

I hope great @mklement0 review PowerShell/PowerShell#15143 and his module helps to resolve the issue.

@jozkee jozkee self-assigned this Nov 2, 2022
@jozkee
Copy link
Member

jozkee commented Nov 4, 2022

This won't work without breaking supported scenarios.

@treenewlyn is asking for a way to escape the quotes in his command, and the existing way of doing that is with backslash, which makes sense.

But we currently support tokens that are quoted and have an ending backslash, one example is a windows path that ends in separator. if we introduce the escaping logic that @treenewlyn is asking for, those cases would consider the "ending separator + quote" as a quote and fail.

[Fact]
public void It_does_not_break_up_double_quote_delimited_values()
{
var commandLine = @"rm -r ""c:\temp files\""";
_splitter.Split(commandLine)
.Should()
.BeEquivalentSequenceTo("rm", "-r", @"c:\temp files\");
}

You can't have both, and since we already have tests showing that windows paths with ending separator are used, I don't know what's the right approach to this. Either do nothing, or take the breaking change.

I already tried a couple of times to reconcile both scenarios, but failed.

For reference, this is what CMD does with dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false:
You can see it works as OP is asking for.

C:\my-app>dotnet run "dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false"
foreach (string arg in args): dotnet publish "xxx.csproj" -c Release -o "./bin/latest" -r linux-x64 --self-contained false

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "dotnet publish \"xxx.csproj\" -c Release -o \"./bin/latest\" -r linux-x64 --self-contained false"

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,dotnet publish "xxx.csproj" -c Release -o "./bin/latest" -r linux-x64 --self-contained false]

And this with rm -r "c:\temp files\", if you don't escape the quotes:
It concats "rm -r "c:\temp and then splits files\".

C:\my-app>dotnet run "rm -r "c:\temp files\""
foreach (string arg in args): rm -r c:\temp,files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "rm -r c:\temp" "files\""

Environment.GetCommandLineArgs(3): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,rm -r c:\temp,files"]

And if I escape the quotes rm -r \"c:\temp files\\":
Is surprisingly deleting the ending \, not even using it as escape character for ".

C:\my-app>dotnet run "rm -r \"c:\temp files\""
foreach (string arg in args): rm -r "c:\temp files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "rm -r \"c:\temp files\""

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,rm -r "c:\temp files"]

Since -r is meaningful for dotnet cli, I also tried the following dotnet run "c:\temp files\":
In this one you can see that the quote is preserved due to the ending \ acting as a escape character, which is inconsistent with the previous example.

C:\my-app>dotnet run "c:\temp files\"
foreach (string arg in args): c:\temp files"

Environment.CommandLine: C:\my-app\bin\Debug\net7.0\consoleapp.dll "c:\temp files\""

Environment.GetCommandLineArgs(2): [C:\my-app\bin\Debug\net7.0\consoleapp.dll,c:\temp files"]

If CMD uses CommandLineToArgvW under the covers, Then we are already doing something different to that function, so I wonder if we should keep striving to behave like it or not?

@jozkee
Copy link
Member

jozkee commented Nov 4, 2022

my-app's code looks as the following:

Console.Write("foreach (string arg in args): ");
for (int i = 0; i < args.Length; i++)
{
    Console.Write(args[i]);
    if (i < args.Length - 1)
        Console.Write(",");
}

Console.WriteLine();
Console.WriteLine();

Console.Write("Environment.CommandLine: ");
Console.Write(Environment.CommandLine);

Console.WriteLine();
Console.WriteLine();

var cmdLineArgs = Environment.GetCommandLineArgs();
Console.Write($"Environment.GetCommandLineArgs({cmdLineArgs.Length}): ");
Console.Write('[' + string.Join(',', cmdLineArgs) + "]");
Console.WriteLine();

@KalleOlaviNiemitalo
Copy link

If CMD uses CommandLineToArgvW under the covers

.NET Runtime can call CommandLineToArgvW here https://github.com/dotnet/runtime/blob/aaf9c8a7702920c4999e0535939e981038bbf42c/src/libraries/System.Private.CoreLib/src/System/Environment.Windows.cs#L220

CreateProcess in Windows takes a command-line string and the new process decides whether and how to split it to arguments. IIRC, some C library passes a list of argument strings via the cbReserved2 and lpReserved2, but I don't know whether .NET Runtime reads those at all.

@KalleOlaviNiemitalo
Copy link

I feel the completion function in the shell should parse quotation marks etc. as needed, and pass to dotnet-suggest a list of strings, the position of the string that contains the insertion point, and the position of the insertion point within that string. It should then read the resulting completions and add any shell-dependent backslashes or quotation marks. That way, dotnet-suggest and the completion sources would not need to support the syntaxes of individual shells.

OTOH, if the shell scripting makes this too slow, then it can be reimplemented in dotnet-suggest -- still not in the completion sources. And this should be an optional optimisation; the argv array mechanism should still be available for shells that dotnet-suggest does not explicitly support.

@jonsequitur
Copy link
Contributor Author

Pushing these responsibilities to the shell is ideal, but it has a longer timeline. Implementing per-shell solutions should be more performant (and shouldn't even require dotnet-suggest or any other intermediate .NET-based process), but it's a lot more work and opens up a large test matrix. A design where System.CommandLine can handle this parsing will remain a useful fallback as support for specific shells comes in.

Other scenarios this method also addresses:

  • Testing. People often think in terms of complete command lines, not arrays, and forcing people to do their own split in their tests often leads to mistakes.
  • Editor tooling, e.g. for writing scripts, or the magic command system in Polyglot Notebooks.
  • Starting an external process based on input from sources other than CLI entry points.

These are a few examples of places where a command line parser is useful outside of the Main method in a CLI app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Parser and Binder bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants