Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESM imports escape the sandbox & runfiles #362

Open
gregmagolan opened this issue Aug 5, 2022 · 17 comments
Open

ESM imports escape the sandbox & runfiles #362

gregmagolan opened this issue Aug 5, 2022 · 17 comments
Labels
bug Something isn't working

Comments

@gregmagolan
Copy link
Member

gregmagolan commented Aug 5, 2022

The relevent code in node is in resolve.js where there is a realPathSync call

https://github.com/nodejs/node/blob/4c5b96b376aa778cbe651362c29a26e0d4c32ccd/lib/internal/modules/esm/resolve.js#L310

This realPathSycn call does not get the fs monkey paches while the cjs require loader's realPathSync call does. It is unclear why.

This is the underlying reason why we current need --preserve-symlinks-main on by default in js_library so the .mjs entry points don't escape their runfiles.

It is also the reason mocha was observed to escape the sandbox in the repro #353 (comment) and likely related to #347.

This affects .mjs entry points and programs that use esm imports. For example, mocha uses the import() built-in that is affected.

@gregmagolan
Copy link
Member Author

We suspect the underlying reason why the fs patches apply to the commonjs path and not the esm path in node is the fs import style in https://github.com/nodejs/node/blob/4c5b96b376aa778cbe651362c29a26e0d4c32ccd/lib/internal/modules/esm/resolve.js

const {
  realpathSync,
  statSync,
  Stats,
} = require('fs');

while the commonjs import is

const fs = require('fs');

Presumably both modules are evaluated before our fs patches, which are done in a --require from the command line. Our patches would apply only to the commonjs path since it has a reference to the fs module which we monkey patch while the esm path has references to the fs functions it needs directly which are not monkey patched.

@cgrindel cgrindel added the funding needed Contribute to https://opencollective.com/aspect-build label Sep 23, 2022
@gregmagolan
Copy link
Member Author

I tested the changed needed in Node.js to resolve this and it works as expected. The change is just aspect-forks/node@6616ddd.

@gregmagolan
Copy link
Member Author

gregmagolan commented Nov 4, 2022

@gregmagolan
Copy link
Member Author

Best known solution seems like maintaining a fork of node and updating the toolchain to use it.

@mkg20001
Copy link

@mkg20001
Copy link

mkg20001 commented Mar 28, 2023

you can use experimental-loaders to insert the monkey patch as a global prelude, etc https://nodejs.org/api/esm.html#globalpreload

this would run before anything else (note that experimental loaders is in another thread, you need globalPrelude to run stuff on the main thread)

you could also see if you can just use experimental-loaders custom resolve function instead of patching fs (it works starting from node 12, with just slight api name changes)

edit: there's an option like --require but for esm, but it's wip/not yet added

@gregmagolan
Copy link
Member Author

https://nodejs.org/api/module.html#modulesyncbuiltinesmexports

Does this fix the issue?

It doesn't. This is being called already after the monkey patches are made,

module.syncBuiltinESMExports()
. I don't remember the context anymore however. @jbedard was working on that one.

@mkg20001
Copy link

mkg20001 commented Apr 1, 2023

I think that patching fs prob won't affect ESM loading as it happens in a seperate thread and you should use experimental-loader resolve in that case (or try the fs patch in a file loaded via --experimental-loader)

@aspect-build aspect-build deleted a comment from cgrindel Apr 1, 2023
@gregmagolan
Copy link
Member Author

Thanks for the suggestion @mkg20001 . I'll spend a bit of time today looking into that.

@gregmagolan
Copy link
Member Author

gregmagolan commented Apr 1, 2023

@alexeagle For reference, there were a few attempts to make the ESM loader's fs calls monkey patchable but they were rejected by the node maintainers:

nodejs/node#39711: module: allow monkey-patching internal fs binding
nodejs/node#39513: module: use monkey-patchable readFileSync

Other (possibly) related PRs & issues:

nodejs/node#41076: Research the feasibility to providing a VFS extension mechanism to fs, require(), child_process, etc

@gregmagolan
Copy link
Member Author

gregmagolan commented Apr 1, 2023

@mkg20001 I tested monkey patching the fs module in a globalPreload hook via --experimental-loaders with no luck. The require('fs') called in https://github.com/nodejs/node/blob/4c5b96b376aa778cbe651362c29a26e0d4c32ccd/lib/internal/modules/esm/resolve.js#L30 appears to still happen before the globalPreload so the patch of realPathSync doesn't take effect there.

@mkg20001
Copy link

mkg20001 commented Apr 2, 2023 via email

@Aghassi
Copy link
Collaborator

Aghassi commented Apr 3, 2023

The custom resolve function on a loader sounds promising if it works and prevents nodejs forking which would be not ideal if it can be avoided.

@gregmagolan gregmagolan removed this from the Performance & Docs milestone Apr 16, 2024
@matthewjh
Copy link

matthewjh commented Apr 26, 2024

@gregmagolan did you have any further luck with the loader approach?

I am currently debugging some sources of sandbox escape with Node.js v18, and this is with the patches to the ESM resolver.js. As you well know, it is tricky business. There are some places that use internalModuleStat rather than fs.statSync, which is not patchable. I noticed in Node v21 this API is now used instead of fs.statSync in the esm loader files, so it appears this problem is likely to get worse rather than better:

https://github.com/nodejs/node/blob/v21.0.0/lib/internal/modules/esm/resolve.js

Luckily, within the context of build and test actions, --experimental_use_hermetic_linux_sandbox seems to be extremely effective at ensuring hermetic JS execution thanks to using hardlinks (so no more symlinks to follow) and chroot (so no walking parent dirs up to the root looking for competing node_modules dirs). Reflecting on this, I do wonder whether aspects of this approach could be emulated to make bazel run of js_binary targets more robust. My main concern here is developers running tools and binaries locally - not deployed binaries, where we should be able to rely on the FS not being polluted, e.g. because they run in docker containers via js_image_layer. Obviously, it is annoying when a developer misses a dep and, rather than manifesting in a "module not found" error, there are subtle bugs due to node loading separate instances of the same module because one was resolved inside runfiles, and one under the bindir (etc.).

I wonder whether this "developer running a js_binary through bazel run" case could be handled specifically by a wrapper script crawling the runfiles dir for all the node_module and application files pertinent to the binary (as rules_js would own this code, we can be sure to stay within the runfiles dir), and then creating a mirroring hardlink tree under /tmp with the node_modules and application files, and then launching node within this /tmp hardlink tree. This, like I say, would apply the benefits seen with the hardlink sandbox to bazel run for js_binary. There is clearly some overlap here with js_devserver.

What do you think?

@matthewjh
Copy link

matthewjh commented Apr 26, 2024

Other things to look at:

  1. Whether the fs guards could better handle Node.js "misbehaviour" here and redirect sandbox escapes back within the sandbox. I think I noticed realPathSync(path) returning path without modification if path is already outside of runfiles. In this case, returning the path inside runfiles that links onto path might be better, as then subsequent fs calls that base themselves on path will fall into the expected pathways.

  2. The CJS loader uses this Module._nodeModulePaths function to get all possible node_modules directories from the current location up to the root...
    e.g.

/my/bazel/path/execroot/bin/mybin.runfiles/app/node_modules
/my/bazel/path/execroot/bin/mybin.runfiles/node_modules
/my/bazel/path/execroot/bin/node_modules
/my/bazel/path/execroot/node_modules
/my/bazel/path/node_modules
/my/bazel/node_modules
/my/node_modules
/node_modules

These then feed into downstream operations, some of which use internalModuleStat. This is not good and I'm pretty sure accounts for some of the escapes I'm seeing.

Luckily, it looks like Module._nodeModulePaths is patchable, meaning rules_js could conceivably drop all roots outside runfiles and hide these from Node.js:

https://github.com/ilearnio/module-alias/blob/dev/index.js#L17

@gregmagolan
Copy link
Member Author

I haven't looked further into the loader approach since the spike a while ago.

I came across this proposal however that looks interesting and could open a path for a solution: nodejs/node#52219. Haven't had time yet to look in detail but at face value it sounds promising.

@Aghassi
Copy link
Collaborator

Aghassi commented Nov 22, 2024

I had a passing thought since we tripped on this again. rules_js maybe could try and use a register from node to rewrite all fs imports it sees (or node:fs) to be it's monkey patched version before node loads the file. Then you could universally patch esm possibly? I don't know, but worth mentioning in case anyone wants to give it ago.

For now, we just disable the sandbox on macOS and leave it on during remote execution with linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: On Deck
Development

No branches or pull requests

6 participants