Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argument parsing #19

Closed
boneskull opened this issue Feb 25, 2019 · 36 comments
Closed

argument parsing #19

boneskull opened this issue Feb 25, 2019 · 36 comments

Comments

@boneskull
Copy link
Collaborator

Node.js can do better than process.argv.slice(2).

I'd like to discuss what the scope of something like this should be. Just a couple notes from my head:

  • People have opinions and different needs around how arguments should be parsed, so we're not likely to please everybody with an implementation, which is fine; those who want something different don't need to use it.
  • My original intuition here was that automated help output would be overreaching. But, seeing as how help output is literally in every single command-line tool (failing that, a man page), we should strongly consider it.
  • The API should be familiar to Node.js users.
  • As this would likely require a new module name, I'm unsure what to call it to avoid potential collisions (assuming there's not going to be a namespace to put it in)
  • It could be a "blessed" Node.js org project, and eventually make its way into core.

Prior art:

(team: feel free to add more links to examples)

@guybedford
Copy link

Another example - https://www.npmjs.com/package/arg

@iansu
Copy link
Contributor

iansu commented Feb 26, 2019

Continuing the discussion from the Tooling WG meeting, I think there are two main usecases we should try to address:

  1. Provide an API that libraries like yargs, commander, etc. could use to simplify their implementation. For example, parsing arguments into a key/value structure.
  2. Provide something that end users could use to do simple argument parsing. Again, maybe just turning args in keys and values and maybe something to generate basic help/usage output.

@vweevers
Copy link

Are there meeting notes available that explain the problem you're aiming to solve? Because even parsing arguments into a key/value structure is opinionated - see e.g. subarg - and although "those who want something different don't need to use it" answers that, I'm left wondering what node (core) could do better than userland.

@boneskull
Copy link
Collaborator Author

Yes, I stopped taking notes when I started talking about it, unfortunately.

@sam-github
Copy link

I'm also wondering what node core could do better than userland.

Node.js can do better than process.argv.slice(2).

There are lots of modules that do better than that already, and there has been continuous innovation/evolution in those modules over the last years. If node blessed a version, it would threaten to suck the oxygen out of the ecosystem, which looks (to me) to be meeting this need pretty well already.

I hope we don't have stdlib envy, because while "language X has an options parser in their stdlib" is true for many X, most of the options parsers in language standard libs are ossified. They are there, but wise developers use better versions, like the ones linked to in the original post under "standard art".

@refack
Copy link

refack commented Feb 26, 2019

👍
I had a similar thought while working with python on GYP.

I hope we don't have stdlib envy
...
They are there, but wise developers use better versions, like the ones linked to in the original post under "standard art".

Yes, I've got stdlib envy WRT to tooling enablement. @sam-github, you raise good points (e.g. ossification, and ecosystem stifling), but in the big picture discussion I'm happy to add enthusiasm to the "pro" column.

I was thinking about creative possible solution to this ambivalence...
One idea was modular stdlib, for instance have configuration for tooling, and one for web-servers.
Another could be a curated meta-package, like for example Debian's build-essentials.

@sam-github
Copy link

I've had the opportunity of watching some of those stdlibs (specifically C, C++, perl, ruby, python, and lua, in that order) go from "new and wonderful" to "what the h**l were they thinking?", and I personally love that npm allows a micro choice of the APIs I want to use.

Node's EventEmitter & stream style APIs were recently called "the laughing stock of the internet because they don't do promises" in a conversation I saw. Fair enough, they don't do promises, but they PREDATE promises. The less you build in, the more easily you can push things aside as they become dated so new devs don't pick it by accident.

C/python/etc. getopt, I'm looking at you.

And optparse was hot for a while, but then argparse became the cool thing.

There are radically different opinions on how to do opts parsing, looks like shark-infested waters to me.

@ljharb
Copy link
Member

ljharb commented Feb 26, 2019

As much as I'd like my preferred argument parsing pattern to be in node, I would be a thousand times more horrified to have one of the other argument parsing patterns in node :-/

@vweevers
Copy link

I use different parsers at different times, they are all great and I don't want any of them in node 😄

@boneskull
Copy link
Collaborator Author

I'm feeling like we're a little too worried about what people will think of the API without actually proposing anything.

I think we need to roll it back a bit...

If you're commenting on this issue, I'm going to assume:

  1. You care about enabling authors of CLI tools to have a better developer experience.
  2. Node.js has much room for improvement in terms of features and support for tooling authors.
  3. You realize that to do this, Node.js will necessarily need to add features where and when it makes sense to correct for this.
  4. Handling command-line arguments is fundamental to writing useful CLI tools.

Right? Great!

My thesis, then, is this:

The lack of an argument-parsing API in Node.js negatively impacts the developer experience. This is why:

  • Robust argument-parsing (via any strategy) is nontrivial to implement.
  • process.argv.slice(2) likely sparks a "WTF moment" for those new to Node.js. This--the current API for dealing with arguments, which can be summarized as "an array of whitespace-trimmed strings, of which the first two items are generally ignored"--is not intuitive nor obvious. Yet, it's boilerplate for whoever wants to work with them.
  • Anecdotally, I've seen nearly as many module authors hand-roll similar argument-parsing implementations as those who pick up a third-party module to do so. I've done this myself; it's perhaps more likely when you aren't initially expecting to accept arguments (node server.js, anyone?).
  • While userland can make up for Node.js' shortcomings ("JavaScript finds a way"), in certain cases, it should not have to.

I don't expect everybody to agree on the thesis, but it's where I'm coming from.

So, instead of a snowball fight over an API for declaring types, defaults, "options" vs. "commands", help text, etc., I'm hoping we can move the conversation in this direction: let's identify tasks common to all (mostly all?) argument-parsing implementations.

Off the top of my head:

  1. Consumers declare a set of "keys", defined using strings, which will be considered "input" to the program.
  2. The parser implementation iterates over an Array (generally process.argv.slice(2)) and recognizes its items as keys and/or corresponding values.
  3. Unknown items are considered: put into a bucket, discarded, or disallowed.
  4. The implementation outputs the key/value pairs in a more convenient/appropriate data structure, such as an Object or Map.

The above parser would be useful for simple parsing, or to build on top of. Given the reaction to this issue, I'm happy to rein in the ambition, and avoid opinionated strategies as much as possible. Even the smallest set of functionality would be better than what we currently have!

@sam-github
Copy link

I think some of the comments here aren't about the API per se, its about the thesis. I don't agree with it, sorry, but maybe other people do. Not sure.

Even the smallest set of functionality would be better than what we currently have!

And I absolutely don't agree with this. Node.js clearly has no options parser, so CLI authors look for them in npm, and find a number of fine ones, and some terrible ones. However, there might be vehement disagreement about which ones are fine and which ones are terrible.

A small set of options parsing functionality would be worse, because it would delay people going to npm to get a good options parser, and open up the Node.js API for criticism because it has a bad options parser (and given how opinionated options parsing is, its likely to be considered bad by a fair chunk of people).

If this thread was full of unhappy users of options parsers, bemoaning their terrible state and how it wasn't possible to build a decent CLI in node because they were all crap, I'd be right on board with doing it right. But that seems pretty far from the case.

So, I remain uncertain about the problem that is being solved.

If the thesis of the tooling WG is "it should be possible to build high quality CLIs using just the node stdlib", then I missed that. I don't agree, but I will stop commenting, because I'd know I'm working at cross purposes, and that's not helpful.

Btw, process.argv didn't inspire WTF in me, it made me think of ARGV in ruby, of sys.argv in python, in char* argv[] in C, of .... you get the idea.

@mcollina
Copy link
Member

I share the same concerns of @sam-github.

At this point of Node.js history, it's probably better to have a piece of docs on the nodejs website that points to popular modules in npm.

@arcanis
Copy link

arcanis commented Feb 28, 2019

Some data point - I took the repository from a known open-source project, and its resolution table contains:

  • 7 different versions of Yargs
  • 3 different versions of Minimist
  • 2 different versions of Commander

All those have different majors that package managers aren't allowed to optimize. It's not a huge issue in the grand scheme of things, and a Node api won't make them disappear overnight, but the lack of it clearly has an impact on our projects.

@vweevers
Copy link

@arcanis In any case that has to be solved in that project, by consolidating everything to use the same parser. Whether that parser is yargs, minimist or a node core API, the effort would be the same.

@arcanis
Copy link

arcanis commented Feb 28, 2019

@vweevers The problem isn't caused by this project in particular, but by its dependencies. My point is that babel, eslint, webpack, prettier, jest, lerna, uglify, ... all those use different CLI parsing libraries which duplicate a lot of logic for very little reason.

In this context maybe a unified Node API would increase the incentive to use a standardized logic (or maybe they would continue doing it for the sake of customizing their CLI, it's hard to tell).

@sam-github
Copy link

Whether #19 (comment) is a problem or not is a matter of debate, but it is not specific to args parsing. I'd say your stats show how often the community arg parsers are used, and that people have chosen to use different ones, with features specific to their liking.

Also, you could do the same analysis for lodash, debug, request, or for many other sets of popularly depended upon modules. Its gotten better with npm hoisting of deps, but still happens a lot. Its also possible to get many copies of sub-deps with identical versions, depending on exact form of the dep tree.

We get "dependency hell" in some languages, where direct dependencies want conflicting versions of their sub-deps and we can't install. With node/npm, we get multiple versions of sub-deps. They both have down-sides, but I'll take npm's approach over dependency hell.

@boneskull
Copy link
Collaborator Author

If the thesis of the tooling WG is "it should be possible to build high quality CLIs using just the node stdlib", then I missed that. I don't agree, but I will stop commenting, because I'd know I'm working at cross purposes, and that's not helpful.

I'm going to guess "it should be possible to build high quality CLIs using just the node stdlib" means something different to you than it does to me.

"Coercing arguments into a more appropriate data structure"--which was my "barebones" suggestion--is not enough to build a "high-quality" CLI, IMO.

It does, however, offer a minimal set of functionality that many users will be able to consume directly and enables creation of more and better higher-level CLI libs than already exist in userland.

By adding the basics to core, we would make the simple case easy to implement without having to pull in userland modules (which may be further inconvenienced by Enterprise Process). And we'd enable those seeking to create their own higher-level libraries by reducing overhead, boilerplate, and lowering the bar.

At this point of Node.js history, it's probably better to have a piece of docs on the nodejs website that points to popular modules in npm.

@mcollina Can you clarify this? Unsure if you're talking about this particular issue or more generally.

@boneskull
Copy link
Collaborator Author

boneskull commented Apr 3, 2019

I've done some initial comparison research on the strategies used by "popular" argument parsers.

I'm not drawing any conclusions from this right now, but here it is:


Argument Parser Analysis

Modules by Popularity

Popularity is defined by npms.io. I used the search term arguments to find most of these, then tried options when I realized commander did not appear in my first search.

  1. yargs-parser
  2. commander
  3. minimist
  4. argparse
  5. coa
  6. command-line-args
  7. arg
  8. mri
  9. bossy

At some point, I decided other modules were not popular enough, and stopped looking.

Excluded:

Comparsion Matrix

All modules evaluated:

  • Return parsed arguments as a JavaScript object
  • Support some notion of "types"
  • Allow arguments to be supplied using the prefixes - or --

= means that the module supports the form --foo=bar.

Module Positional Commands Aliases Defaults Required =
yargs-parser 1 1 1 1 1 1
commander 1 1 1 1 1 1
minimist 1 0 1 1 0 1
argparse 1 1 1 1 1 1
coa 1 1 0 1 1
command-line-args 0 1 1 1 0
arg 1 0 1 0 1
mri 0 1 1 0 1
bossy 0 0 1 1

α: Supports one alias
β: Via user-supplied handler function
γ: After --, e.g., ./executable --foo bar -- something
δ: Unclear

Types Comparison

All modules evaluated support these types:

  • Boolean/flag
  • String
Module Numeric Count Variadic
yargs-parser 1 1 1
commander
minimist 0 0 0
argparse 1 1
coa 0 0 1
command-line-args 1 1 1
arg 1 1 1
mri 0 0 1
bossy 1 0 0

α: Discrete types for "float" and "integer"
β: Via user-supplied handler function

@boneskull
Copy link
Collaborator Author

Corrections appreciated 😄

@sam-github
Copy link

sam-github commented Apr 3, 2019

https://www.npmjs.com/package/posix-getopt is my favourite. I started using it after reviewing a number of the above, and found them way too heavyweight (commander), or flat out incorrect.

Every one that doesn't accept configuration as to whether an option (long or short) takes an arg or not is incapable of parsing a command line correctly, because it can't tell if a - is the start of a new option/switch, or the argument to the last (EDIT: or whether the option takes an arg). I suggest adding this to your criteria.

minimist is one of the sinners: check out https://github.com/nodejs/branch-diff , see its docs, try its CLI

% branch-diff --simple master pr-22894
/home/sam/.nvm/versions/node/v11.12.0/lib/node_modules/branch-diff/branch-diff.js:194
      throw err
      ^

Error: Must supply two branch names to compare

WTH? Its because https://github.com/nodejs/branch-diff/blob/2d81c5a18b1e5d2a48dec85a7a739c3a14534e5b/branch-diff.js#L151 zero config means it assumes master is the argument to --simple. No zero-config cli parser can correctly parse an ARGV.

You might also add to your matrix whether the library properly allows short options to be combined: -a -b -c file same as -abc file, and whether it supports -- meaning end-of-options.

@boneskull
Copy link
Collaborator Author

I’m not sure I understand; minimist allows the consumer to declare whether an option should be considered a flag or it expects a value.

From what I could tell, few of them support combined short options. This suggests that most CLI authors aren’t using/expecting this functionality.

I think some context was missing here. The aim of this comparison is to discover what popular modules do in order to have a precedent for constraining the feature set.

@boneskull
Copy link
Collaborator Author

But I can go back and add whatever feature people think they want to see. It’s possible I missed something widely implemented!

@sam-github
Copy link

minimist allows the consumer to declare whether an option should be considered a flag or it expects a value.

I'm happy to be proven wrong, but I don't see such an option:

https://github.com/substack/minimist#var-argv--parseargsargs-opts

@sam-github
Copy link

From what I could tell, few of them support combined short options.

Maybe its irrelevant here, I don't mean to side-track this, but be aware that if they can't combine short options, they are incapable of implementing POSIX CLI standards: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html

Windows conventions aren't relevant, I've never seen a node tool that supports /h for help.

@boneskull
Copy link
Collaborator Author

yeah... it's not too relevant to the aim of the comparison, which was to determine what these libraries typically do support.

@ruyadorno
Copy link
Member

This landed in nodejs/node#42675

First released as an experimental API in Node.js 18.3.0 (Current) and on Node.js 16.17.0 (LTS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests