Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch/Instrument a module #49452

Closed
Flarna opened this issue Jun 17, 2019 · 24 comments
Closed

Patch/Instrument a module #49452

Flarna opened this issue Jun 17, 2019 · 24 comments

Comments

@Flarna
Copy link
Member

Flarna commented Jun 17, 2019

Maybe the information I'm looking for is already somewhere but I'm not able to find it. I'm happy if I get redirected.
I'm also quite new in the ECMAModules area so I may simply don't know significant details so please excuse if my question is inaccurate or improper.

There are quite some tools (Transformers, APMs,....) out which monkey patch module._load() and/or module.prototype.require() to hook into loading of CJS module loading to modify their content. For APMs the modification is usually just to wrap a handful of the exported functions but keep the rest as it is.

I tried to find the corresponding solution for ECMAModuls but I failed. I found nodejs/modules#98 which also describes this requirement and it links to https://github.com/bmeck/node-apm-loader-example but this sample seems to no longer work (useing 12.4.0). I get following error (after fixing a few nits):

const exports = Object.create(fs, {
                              ^

ReferenceError: Cannot access 'fs' before initialization
    at file:///C:/workspaces/GitHubForks/node-apm-loader-example/overloads/fs.mjs:4:31
    at ModuleJob.run (internal/modules/esm/module_job.js:111:37)
    at async Loader.import (internal/modules/esm/loader.js:128:24)

Even if it would work I have the impression that this solution actually changes the URL in module map for a patched module, e.g. fs would be then URL to ./fs.mjs. For an user importing fs this is don't care but if there are more hooks listening on fs only the first one would be applied. At least in CJS I have seen several times that more then one APM/Transformer is in use within one application.

Besides that this sample is in my opinion unneeded complicated for the common usecase to just wrap a few exported functions but keep the remaining stuff as it is. Would it be possible to create a simpler hook for this?

@jkrems
Copy link
Contributor

jkrems commented Jun 17, 2019

First of all - thanks for reaching out! One quick pitch before I get into this: we're currently missing APM representation in the modules working group (afaik). So, if you can join yourself or know somebody who could, that could help us make sure our solutions keep that perspective in mind. We are aware of this use case but there's always a lot of gray space between "can be done" and "is nice to work with".

The important piece is "loaders aren't done". There is a loader implementation, there are examples of using it, but there's still varying opinions on how close it is to a final loader hooks API.

The way multiple loaders would (may?) work is the following, assuming loaders A, B were registered in this order:

  1. B resolves fs, yields to previous (A) for originalURL. Then it returns file:///path/to/apm/fs.mjs?original=${originalURL}.
  2. A resolves fs, yields to previous (<native>) for originalURL. Then it returns file:///path/to/dev-tracer/fs.mjs?original=${originalURL}.
  3. The <native > loader returns nodejs:fs (placeholder).

The important part is that A would need a way to prevent its own eventual import of the original nodejs:fs. It's easy to prevent that inside of A but a lot harder from inside of B (since it doesn't know that A's code is privileged and should be granted access to the original fs module).

Another issue for random 3rd party modules is also the "intercept and re-exports is hard" problem. export * from 'original' works but won't include the default export. And it's impossible to mutate the exports directly from the outside without wrapping.

The short version: This hasn't actually been figured out yet and that's true for loader hooks in general afaik.

@bmacnaughton
Copy link
Contributor

This is my primary concern as well. I work in the APM area and have been trying to keep an eye on loaders but I haven't seen much activity.

Can you point me at any discussions other than nodejs/modules#98 above?

The way our patching works is to replace the require function with our own. Each module that is patched effectively replaces the original module. The way you describe looks like we will need to manifest each patched module with a file that somehow requests/requires the original/native module and implements our patching. Am I understanding it correctly?

Let me know if/how I can help here. I cannot devote full time to this but will do what I can.

@Flarna
Copy link
Member Author

Flarna commented Jun 17, 2019

Thanks for the info. I will take a closer look into the current codebase to get deeper into this. Would be nice to get hints if there are any docs/minutes/... somewhere describing the proposed solutions and ideas tried till now.

I'm fine with being include in this group. Can't promise any specific number of hours/days/... to work on this but for sure I can help with usecases we have and what we have seen in the wild in this area.

Usually the tools hooking into the loaders are not aware of each other. A typical example is that some developer uses a transpiler like babel and the operations team in his company decides to add an APM tool in production. The developer is maybe not even aware of this as it happens via some command line option (e.g preloading via -r,...) during deployment. The same app may have several additionally indirect dependencies to CLS like modules which often works also via monkey patching in a loader hook.

So in short it's usually not possible to assume that one loader hook knows about the other and new hooks may jump in/out just because of some change in an indirect dependency. Even now it happens that one breaks the other (see https://www.npmjs.com/package/pirates for a solution for one of these conflicts). Most of the time it works fine as all of them just wrap APIs without modifying any directly visible characteristic like name/caching/...

If ECMAModule hooks require a dependency between each other it should be reflected in the hook API to ensure that the complete info arrives at all loaders in the same way and sequence is don't care (as long as the hooks just wrap APIs and don't patch away exports).

@bmacnaughton
Copy link
Contributor

@Flarna - pirates was very different than I was thinking - we patch the compiled module, not the source code itself. In doing so we always keep the API unmodified so that we don't break the code that uses it. I had never seen patching work at the source level before. Is this how you have patched code?

@Flarna
Copy link
Member Author

Flarna commented Jun 17, 2019

We patch/wrap just the exports and share this approach with other APMs and at least CLS like modules.

But I think the loader hooks should be not limited to this use case therefore I referenced also the others.
In general both variants tend to keep the structure/API of the exports to don't break anything, modifying code is more intrusive but clearly offers more possibilities.

Not sure if both usecase should be supported by the same loader hook.

@bizob2828
Copy link

@jkrems I’d like to help bring representation to instrumentation API or hooks. I am an observer but have missed the last few. I’m more than happy to explain my use cases and what I’d like to see from ESM that we didn’t get get with CJS. How can I bring this to the table?

@GeoffreyBooth GeoffreyBooth added the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Nov 22, 2019
@GeoffreyBooth
Copy link
Member

@bizob2828 and the others on this thread, can you make the next meeting? I’ve added the modules-agenda label so that we discuss loaders at the next meeting (2019-12-04 at noon Pacific). If that doesn’t work for you please create a Doodle for an out-of-band loaders meeting and we can find a time that works for the most interested parties. Loaders is probably our biggest outstanding feature right now, so it’s something we’d like to get some attention and dev effort focused on.

@bizob2828
Copy link

@GeoffreyBooth I can. It's very intimidating to talk in these meetings so are there use cases or code samples I can create to seed this discussion? Also would this just be scoped to some instrumentation/hooks of ESM or would it also apply to CJS? I get confused when this is referred as loaders. If you can share any design docs or examples of proposed APIs it would help.

@GeoffreyBooth
Copy link
Member

@GeoffreyBooth I can. It's very intimidating to talk in these meetings so are there use cases or code samples I can create to seed this discussion? Also would this just be scoped to some instrumentation/hooks of ESM or would it also apply to CJS? I get confused when this is referred as loaders. If you can share any design docs or examples of proposed APIs it would help.

@bizob2828 I’m sorry to hear that you feel that the meetings are intimidating. It’s been a concern of mine for awhile that this group can come across as hostile to outsiders, as there are many members of the group with very strong opinions and itchy GitHub trigger fingers. It’s something we need to discuss and improve.

At least for now I think this would be scoped to ESM, as the story for instrumentation of CommonJS is already pretty well worked out; and ESM is a very different case as unlike CommonJS, ESM has defined stages before code evaluation, and the instrumentation would need to go in one of those earlier stages. That’s what the loaders are for, to let users inject custom code during the “pre-evaluation” phases. There is an experimental loader API already, that you can read about at https://nodejs.org/api/esm.html#esm_experimental_loader_hooks, but it’s far from complete and definitely needs much more development. The idea is that we want to expand this API to cover all these use cases like instrumentation, transpilation and so on.

So I guess if you don’t mind reading that, and maybe listing some of your use cases and also whether you think the current API can support them? If you have time to try to write any loaders for the current API to demonstrate whether or not it handles your use case or not, that would be illuminating. And then your thoughts on what the loader API needs to add to satisfy your needs.

@bizob2828
Copy link

@GeoffreyBooth the intimidation is mostly on me and my lack of confidence speaking with folks who may know more on the subject. I will read this stuff and provide some use cases. I hope to get to it before next meeting but with work schedules and travel for holiday I'm not sure

@bizob2828
Copy link

bizob2828 commented Dec 5, 2019

@GeoffreyBooth I took a look at the loader hooks and unless I'm missing something I don't see a way to get access to the compiled code. My use cases may be very similar to APM tools or something like istanbul:

  • keep track of which modules have been imported(by name and url, which I see I can already do with the example in the docs)
  • access to compiled source code to rewrite(both user and core nodejs code)
  • monkey patch a module at import time(both user and core nodejs code)

The way we're currently doing 2nd bullet is just patching _compile like this

module.exports = function hookModCompile() {
  var mod = Object.getPrototypeOf(module);
  mod.__compile = mod._compile;
  mod._compile = function(content, filename) {
    var newContent = rewrite(content, filename);
    var compiled;
    try {
      compiled = this.__compile.apply(this, [newContent, filename]);
    } catch (err) {
      if (err instanceof SyntaxError) {
       console.log(`failed to compile rewritten code ${err.message}`);
       // re-compile original
       compiled = this.__compile.apply(this, [content, filename]);
      } else {
        throw err;
      }
    }

    return compiled;
  };
};

The way we're currently doing the 3nd bullet is just patching require like this

  const origModRequire = Module.prototype.require;
    Module.prototype.require = function(name) {
      let nodule = origModRequire.call(this, name);
      // do some stuff with required module
    }

The reason why I asked if these loader hooks apply to both cjs and mjs is because our current application is a loader. We need to rewrite the main file but if we just did what APM tools did by requiring in the main, it'd be too late. It appears my ask is like nodejs/modules#6

@GeoffreyBooth
Copy link
Member

@bizob2828 Yes as far as I can tell there’s no way to transform source code at the moment. I tried to write a CoffeeScript loader as a proof of concept and I couldn’t do it, as the current hooks just don’t provide a way to override the content of the loaded file. That’s high on the to-do list for loaders, I think.

cc @a-lxe @bmeck

@Cyclonick
Copy link

@DerekNonGeneric Thanks for the update and progress on this! On behalf of @Flarna I will have a look into this and see how far I can get. I will keep you posted as soon as I have some results.

@MylesBorins MylesBorins added modules-agenda Issues and PRs to discuss during the meetings of the Modules team. and removed modules-agenda Issues and PRs to discuss during the meetings of the Modules team. labels Feb 12, 2020
@MylesBorins MylesBorins removed the modules-agenda Issues and PRs to discuss during the meetings of the Modules team. label Mar 11, 2020
@MylesBorins
Copy link
Contributor

Removing label for now, we can bring it back when there is more to discuss

@bmacnaughton
Copy link
Contributor

@DerekNonGeneric - the link https://github.com/DerekNonGeneric/loader339 is no longer valid - is your loader still available somewhere?

also, while this seems so straightforward that there must be some obvious problems, is there a reason that a transformExports hook, similar to the transformSource hook, isn't a viable solution for APM monkey-patching and similar needs?

@bmacnaughton
Copy link
Contributor

you ok with providing me access? i'll provide feedback; i'm not very sophisticated in terms of node's broad needs in this area but i do develop an APM solution for work. my main interest is in making that work as seamlessly as possible.

@bmacnaughton
Copy link
Contributor

@DerekNonGeneric - spent a little time getting bmeck's original module patching code working with the node api changes for the resolve() hook and i've sent a hangouts invite. thanks for the work you've done; i haven't looked at this since last november and a fair amount has happened.

@bmacnaughton
Copy link
Contributor

@DerekNonGeneric - when you accept my hangouts invite i have a few questions i'd like to explore.

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Jul 16, 2020

@MylesBorins - it seems that the exports section will make it impossible for APM to patch live code if it is not listed in the exports section. that is a common need for APM. is this a good place to discuss it or should i open a new issue?

guy bedford informed me that the resolve hook will be called on files that aren't exported so this won't be an issue. you can ignore this. thanks.

@MylesBorins
Copy link
Contributor

@bmacnaughton I think this is a fine place to comment. I think that APMs should still be able to patch any file via loaders

@GeoffreyBooth GeoffreyBooth transferred this issue from nodejs/modules Sep 1, 2023
@bnoordhuis
Copy link
Member

If I understand the ask correctly, this is covered by the V8 inspector's LiveEdit functionality (the Debugger.setScriptSource command, essentially patch-and-continue.)

It's not very well documented but a suitably motivated individual / APM vendor can piece it together by looking at deps/v8/include/js_protocol.pdl and V8's test suite.

I'm going to close this but LMK if there is reason to reopen.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Sep 3, 2023
@GeoffreyBooth
Copy link
Member

If I understand the ask correctly, this is covered by the V8 inspector’s LiveEdit functionality (the Debugger.setScriptSource command, essentially patch-and-continue.)

If I remember correctly, that API doesn’t support ESM, only classic scripts. And regardless, it’s not exposed to end users as a Node API, so even if it did what we wanted the feature request would remain.

Reopening because I don’t think this is a solved question for ESM. I know Vite does a clever workaround by creating a wrapper that it swaps the body out of, but I think it has limitations (can you change the named exports?) and cache-busting solutions all leak memory. Certainly there’s no solution as easy as require.cache, and I think the user request is to get something that’s on par with that or close to it.

@GeoffreyBooth GeoffreyBooth reopened this Sep 4, 2023
@bnoordhuis
Copy link
Member

that API doesn’t support ESM, only classic scripts

It supports both. WASM too, for that matter.

It won't let you fudge with imports or exports but that's a use case covered by custom loaders.

it’s not exposed to end users as a Node API, so even if it did what we wanted the feature request would remain

You state that as a fact but it's really just your opinion. My opinion is that node shouldn't duplicate functionality that V8 provides out of the box and that's a policy node by and large has followed over the years (in fairness, largely because of my opinion :-))

Certainly there’s no solution as easy as require.cache, and I think the user request is to get something that’s on par with that or close to it

Maybe, but "it's hard" isn't a compelling argument by itself. The target audience here is APM vendors and it's okay if they have to work for it. (I say that as an ex-APM vendor myself.)

The litmus test for keeping this feature request around is: can you already do this with node today? If the answer is yes (which I believe it is), please close it.

@GeoffreyBooth
Copy link
Member

Sorry the comment about setScriptSource got me thinking this was about hot module reload, which I consider unsolved, not instrumentation which is in a better place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants