src: command line option to preload modules #881

ofrobots · 2015-02-18T18:36:44Z

Here's a proposal, along with a PR, I would like to get feedback from the community on.

It would be nice if there was a standard mechanism to allow vendors, such as a cloud host, to customize startup of node applications.

Basically, I want to be able to inject functionality into a node application without requiring an explicit require. This could be useful to automatically enable, e.g., vendor specific tracing and application performance monitoring APIs when applications are deployed.

There are a few ways this could be done:
0. (Status quo) Ask the developers to modify their applications and add an explicit require of a vendor provided npm module. This adds friction to the deployment experience and possibly prevents people from being able to easily try different cloud vendors.

The vendor can maintain their own fork of node.js/io.js but with extra functionality added. This package would be offered instead of the standard node.js/io.js packages. This is potentially high overhead for the vendor. Effort may be duplicated by different vendors.
Build a standard mechanism into node.js/io.js that allows vendors to inject some functionality into the node startup process.

Here's a proposal on how to do option 2 (EDIT: updated as per suggestions by @piscisaureus):

~~On startup, node checks if process.env.NODE_VENDOR_STARTUP is set. If so, the file node_prefix + '/lib/vendor/index.js' is loaded (if it exists).~~
~~Vendors could package up their distributions of node/io by adding lib/vendor.~~
~~Another alternative would be for NODE_VENDOR_STARTUP to be a path. It gives more flexibility, but is probably not as secure.~~

EDIT: 20150313: change to command line option.

iojs -r foo -r /bar/baz.js -r quux

translates to:

require('foo');
require('/bar/baz.js');
require('quux');

Thoughts?

cjihrig · 2015-02-18T18:42:21Z

Isn't that the purpose of _third_party_main?

EDIT: Although that appears to be for compile time.

sam-github · 2015-02-18T18:43:58Z

I've wanted a feature like this, too. Some node core features (such as cluster) basically do this, but they do it by using fixed env variables and check for them and trigger specific code. So in some sense its a "feature" available to core, but no way has been found to expose it to user-land. In order to factor cluster out of core, something like this would be needed.

I don't think this implementation is great, particularly the hard-coded path.

What would make more sense to me is an env variable like NODE_PRELOAD which can be set to any path, or perhaps a node --preload=FILE path, and then a NODE_OPT variable that could be set to any of the command line options.

ofrobots · 2015-02-18T19:45:41Z

@sam-github: NODE_PRELOAD is a much better name for this, thanks. I can re-work the PR to make it an arbitrary path instead of hard-coded. I still like the idea of the env. var. more, but if there is strong appetite for a command line options, perhaps it can be in-addition-to the env. var. with the semantics that the command-line option overrides the env. var.

@cjihrig: _third_party_main lets you build stand-alone executables of your javascript program, provided as _third_party_main at node compile time. The output isn't node but rather an executable for your javascript program.

piscisaureus · 2015-02-18T21:16:45Z

On startup, node checks if process.env.NODE_VENDOR_STARTUP is set. If so, the file node_prefix + '/lib/vendor/index.js' is loaded (if it exists).

The node_prefix plus something hard-coded logic seems illogical to me. Something I'd find more appealing:

export NODE_PRELOAD=foo,/bar/baz.js,quux

translates to

require('foo');
require('/bar/baz.js');
require('quux');

ofrobots · 2015-02-18T23:10:45Z

@piscisaureus Updated as per your suggestion.

meandmycode · 2015-02-19T07:03:09Z

As a vendor, if you have enough control to change the command line then why not just node invoke to a proxy script that invokes the original script?

bnoordhuis · 2015-02-19T08:34:56Z

@meandmycode asks a good question.

As to the implementation, loading arbitrary code based on an environment variable is insecure when the iojs binary is setuid root.

On the C++ side, secure_getenv() in src/node.cc protects against that. There is no JS equivalent at the moment though, due to there being no geteuid() and getegid() bindings.

ofrobots · 2015-02-19T16:21:57Z

@meandmycode I want the preload modules to be loaded uniformly across all uses of node, including node -e 'console.log("hello world");' and uses of the repl. I think the proxy script for node/iojs would end up duplicating a lot of the command line parsing logic from node. IMO, my current patch, ignoring whitespace, is more elegant at 4-5 lines of code.

Also, I think the NODE_PRELOAD mechanism is going to be more generally useful beyond the vendor tracing/apm use-case that I describe. For example, JVM offers JVM Tool Interface (JVMTI) as a generally useful mechanism to people to implement various functionality on top of the JVM. It would be nice to be able to extend node beyond the status quo of explicitly adding code into the user's application.

See also the use-case @sam-github describes above.

@bnoordhuis Yes, you're right that setuid is a concrete problem. I had vague feelings of security issues which is why I had originally proposed a hard-coded path within the installation directory. I can look into secure_getenv and what might be involved in adding geteuid() and getegid() bindings.

meandmycode · 2015-02-19T16:35:31Z

Right, but how will node -e 'console.log("hello world");' know to do this if you don't pass a node_preload? are are you expecting the node_preload command to be run before hand to persist configuration permanently?

ofrobots · 2015-02-19T17:10:22Z

NODE_PRELOAD would be an exported environment variable.

meandmycode · 2015-02-19T17:18:11Z

Ah sorry, I was confused on that aspect and the preload parameter. I feel like environmental variables are a bad story.

How do you target a specific io.js/node? How would it affect things like atom etc? looking adjacent to the exec path seems like a safer option but I'm obviously not as informed in this space.

ofrobots · 2015-02-19T17:32:34Z

Environment variables are the easiest starting point for this feature, but a well rounded implementation would probably offer both environment variables and a --node-preload command line option as I proposed above.

sam-github · 2015-02-19T17:56:50Z

I think there are some legitimate concerns, I'm not totally sold on this, but I do think its worth considering hard.

A couple comments:

just use a proxy script: currently doing so requires using undocumented node APIs, though I guess you could call that different problem.
env vs. command line: modifying environment allows changing the behaviour of node instances that are NOT directly started by you. This is pretty much the point of configuration via environment variables! Many of the things that can be done with a env variable can be done via command line, but we still have env vars.
security, setuid bits, and env: I'd be interested in hearing more about what makes node unique in this regard, and hope not to hear that its because people are regularly making it setuid! In particular, NODE_PRELOAD=X is equivalent to RUBYOPT=-rX. Ruby inherited this feature from perl, see PERL5OPT, and the various perl flags that can be passed. So, its possible these interpreters have a security hole that we shouldn't emulate... or that this isn't considered a security hole. Also, note that NODE_PATH already allows some control over the modules loaded by node via environment.

/cc @rmg

rmg · 2015-02-19T18:59:27Z

@sam-github thanks for the mention!

I have generally been in favour of the proxy command route, but it definitely doesn't mix well with some really useful use-cases, like -e, -p, and the REPL. Someone should probably document that module loading API..

I also question the setuid case. If you've done that to your node binary then how is loading flags from an environment variable increasing the risk when you've already decided to let people run arbitrary code with those privileges? Maybe I'm missing something.

Another option, which could be more broadly useful, is to make -e/-p/<path..> NOT mutually exclusive. The "right" way to do this would probably be adding an additional flag similar to -e that evaluates the inline script in the same context as the script being run or the REPL being presented.

For the sake of examples, I'll say -E is that -e++ flag:

pre-load wrapper module: node -E "require('wrapper')" my_app.js
super-powered repl: node -E "require('shelljs/global')"
browser-ish REPL: node -E 'global.window = global'

After typing up those examples, it definitely feels clunky if pre-loading modules is the problem being solved.. that clunkiness might discourage over-use, which is good or bad depending on your preferences. Maybe a -r/--require flag would be better. It is certainly useful in Ruby!

Something like a NODE_OPTS environment variably would be really useful for doing things like modifying the node/v8 flags for globally installed npm modules NODE_OPTS=--trace-deprecations foo.

trevnorris · 2015-02-19T20:53:42Z

+1 for NODE_PRELOAD.

EDIT: Conceptual +1. Of course have to work through the practical issues, like security.

rvagg · 2015-02-19T22:46:02Z

-1 to using an environment variable to inject functionality into the trusted core, think through for a moment where those variables could come from and how unlikely you are to ever check that NODE_PRELOAD exists before running the process

sam-github · 2015-02-20T03:59:55Z

/etc/profile
~/.bashrc, ~/.bash_profile, etc. (or zsh, etc., )
typed on the command line...

You control your environment... you better, if not, someone can just set your PATH to their version of node, or ls, right?

Again, this is something ruby and perl have, so its hard to call an all-caps security flaw.

meandmycode · 2015-02-20T07:11:46Z

I think those scenarios seem fine, although it feels super hacky, almost asking for a backdoor in because it's convenient for you as a host of what is someone else's app most likely. My only concern is third parties using (embedding) iojs- I don't think this behavior should exist for those instances of iojs.

ofrobots · 2015-02-20T11:59:05Z

@rmg I like the ideas about -r and NODE_OPTS – those are indeed useful on ruby. -E sounds more flexible, but you can always create a one-liner file and -r require that instead. I guess the question is whether the most common realistic use-cases are going to one-liners (-E is better) or not (-r is better)

@rvagg I can already inject functionality into the trusted core using LD_PRELOAD. Plus what @sam-github said.

@meandmycode If the embedding scenario is an important concern, it can be dealt with a --disable-preload configure flag. As a user, though, maybe I do want to debug an embedded node.. it would be nice if I could do it without having direct access to the command line being passed to the embedded node.

Environment variables are a different kind of a tool than a command line option. Both have their valid use-cases.

trevnorris · 2015-02-20T17:42:43Z

@ofrobots I believe the difference between NODE_PRELOAD and LD_PRELOAD is that the latter only injects the functions (so even if a bad function is loaded it won't necessarily be run). While the former would require actually executing JS every time. More easily allowing for exploit.

rmg · 2015-02-21T19:10:02Z

@trevnorris on systems that use ld for binary loading LD_PRELOAD lets you pre-resolve any function normally loaded from a dynamic library, so it's pretty easy to inject a function that is almost guaranteed to be called. This is how valgrind tracks memory leaks by replacing malloc, free, etc. Other environments have similar means of injecting dynamic libraries.

That said, NODE_PRELOAD would lower the barrier to entry for writing such malware by removing the binary requirement as well as making the attack vector cross platform.

rvagg · 2015-02-22T00:12:00Z

As a compromise can I suggest just sticking with a commandline flag instead of env var for now, perhaps extend it later? Otherwise we might get bogged down arguing about this. I like the feature, I just don't like magic environment variables.

cjihrig · 2015-02-22T00:34:15Z

+1 for an explicit command line argument.

trevnorris · 2015-02-22T09:39:07Z

I'm cool with a command line argument.

ofrobots · 2015-02-22T20:58:05Z

Sounds good to me. I will summarize the discussion and adjust the PR in a little bit (need to be away from the 'net for a couple of days).

sam-github · 2015-02-23T19:08:21Z

I'm fine with a command line argument, as well, though it doesn't address the use-case of effecting node child invocations transparently. Still, we can try it out a bit and get feedback.

ofrobots · 2015-02-23T21:14:22Z

@sam-github Indeed. However, we don't necessarily have to conflate the need to effect node child invocations with the need to preload modules. I think an independent separate argument can be made for a more general NODE_OPT as you and @rmg have proposed.

ofrobots · 2015-02-27T15:29:31Z

I have updated the pull request to implement preloading modules via a -r/--require command line similar to Ruby's -r. The option takes a path-delimiter-separated list of modules that are preloaded before the main script starts.

The general consensus in the discussion above seems to be that the ability to preload modules is a useful feature. There was however debate on the topic of whether this should be done via command-line options or via environment variables. Upon reflection, it seems environment variables are an orthogonal concern. It make sense to decouple these discussions. Since some valid use-cases for environment variables came up, an independent issue can be be opened on the topic of a more generic NODE_OPT environment variable.

We also discussed a non-exclusive variant of the -e option, say -E. I didn't implement it this way as, IMO, -r is a better fit if the primary use-case is going to be preloading functionality that's more than a one-liner.

sam-github · 2015-02-27T23:12:56Z

src/node.cc

@@ -2935,6 +2942,12 @@ static void PrintHelp() {
         "  -p, --print          evaluate script and print result\n"
         "  -i, --interactive    always enter the REPL even if stdin\n"
         "                       does not appear to be a terminal\n"
+#ifdef _WIN32
+         "  -r, --require        ';'-separated list of modules to"


I don't think we need this... you can use : in unix or win32 package names (unadvised, though, and if it fails to work your bug reports will be laughed at). Stick with :. Thoughts, @piscisaureus, or @orangemocha, am I too unix-centric? :-)

I was following what we do for NODE_PATH. The other alternative would be to use a comma, but I think that's even more likely to be in a package name. I'm open to suggestions.

Using a colon for this doesn't work on windows.
+1 to what the PR makes of it.

sam-github · 2015-02-27T23:14:48Z

Looks OK to me as a feature, I didn't read the code.

sam-github reviewed Feb 27, 2015
View reviewed changes

sam-github mentioned this pull request Oct 21, 2016

Add NODE_PRESERVE_SYMLINKS environment variable #8749

Closed

3 tasks

blakeembrey mentioned this pull request Dec 29, 2016

--require flag does not load from node_modules TypeStrong/ts-node#257

Closed

ofrobots mentioned this pull request Mar 17, 2017

module: check env NODE_PRELOAD for preload modules #11888

Closed

4 tasks

This was referenced Mar 23, 2017

Proposal: --config=file command line argument #11997

Closed

src: allow CLI args in env with NODE_OPTIONS #12028

Merged

sam-github mentioned this pull request Jul 5, 2017

fixed TLSSocket documentation error from Issue #3963. #14062

Closed

2 tasks

leonardoadame mentioned this pull request Nov 28, 2023

[Snyk] Security upgrade nyc from 11.9.0 to 13.0.1 leonardoadame/node#182

Open

aliscco mentioned this pull request Nov 28, 2023

[Snyk] Fix for 3 vulnerabilities aliscco/alisco-node#685

Open

leonardoadame mentioned this pull request Nov 29, 2023

[Snyk] Fix for 10 vulnerabilities leonardoadame/node#229

Open

This was referenced Nov 29, 2023

[Snyk] Security upgrade nyc from 11.9.0 to 13.0.1 aliscco/alisco-node#726

Open

[Snyk] Fix for 4 vulnerabilities aliscco/alisco-node#746

Open

This was referenced Nov 30, 2023

[Snyk] Fix for 5 vulnerabilities aliscco/alisco-node#813

Open

[Snyk] Fix for 5 vulnerabilities aliscco/alisco-node#822

Open

[Snyk] Security upgrade nyc from 11.9.0 to 13.0.1 aliscco/alisco-node#829

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: command line option to preload modules #881

src: command line option to preload modules #881

ofrobots commented Feb 18, 2015

cjihrig commented Feb 18, 2015

sam-github commented Feb 18, 2015

ofrobots commented Feb 18, 2015

piscisaureus commented Feb 18, 2015

ofrobots commented Feb 18, 2015

meandmycode commented Feb 19, 2015

bnoordhuis commented Feb 19, 2015

ofrobots commented Feb 19, 2015

meandmycode commented Feb 19, 2015

ofrobots commented Feb 19, 2015

meandmycode commented Feb 19, 2015

ofrobots commented Feb 19, 2015

sam-github commented Feb 19, 2015

rmg commented Feb 19, 2015

trevnorris commented Feb 19, 2015

rvagg commented Feb 19, 2015

sam-github commented Feb 20, 2015

meandmycode commented Feb 20, 2015

ofrobots commented Feb 20, 2015

trevnorris commented Feb 20, 2015

rmg commented Feb 21, 2015

rvagg commented Feb 22, 2015

cjihrig commented Feb 22, 2015

trevnorris commented Feb 22, 2015

ofrobots commented Feb 22, 2015

sam-github commented Feb 23, 2015

ofrobots commented Feb 23, 2015

ofrobots commented Feb 27, 2015

sam-github Feb 27, 2015

ofrobots Feb 28, 2015

piscisaureus Feb 28, 2015

sam-github commented Feb 27, 2015

src: command line option to preload modules #881

src: command line option to preload modules #881

Conversation

ofrobots commented Feb 18, 2015

cjihrig commented Feb 18, 2015

sam-github commented Feb 18, 2015

ofrobots commented Feb 18, 2015

piscisaureus commented Feb 18, 2015

ofrobots commented Feb 18, 2015

meandmycode commented Feb 19, 2015

bnoordhuis commented Feb 19, 2015

ofrobots commented Feb 19, 2015

meandmycode commented Feb 19, 2015

ofrobots commented Feb 19, 2015

meandmycode commented Feb 19, 2015

ofrobots commented Feb 19, 2015

sam-github commented Feb 19, 2015

rmg commented Feb 19, 2015

trevnorris commented Feb 19, 2015

rvagg commented Feb 19, 2015

sam-github commented Feb 20, 2015

meandmycode commented Feb 20, 2015

ofrobots commented Feb 20, 2015

trevnorris commented Feb 20, 2015

rmg commented Feb 21, 2015

rvagg commented Feb 22, 2015

cjihrig commented Feb 22, 2015

trevnorris commented Feb 22, 2015

ofrobots commented Feb 22, 2015

sam-github commented Feb 23, 2015

ofrobots commented Feb 23, 2015

ofrobots commented Feb 27, 2015

sam-github Feb 27, 2015

Choose a reason for hiding this comment

ofrobots Feb 28, 2015

Choose a reason for hiding this comment

piscisaureus Feb 28, 2015

Choose a reason for hiding this comment

sam-github commented Feb 27, 2015