-
Notifications
You must be signed in to change notification settings - Fork 43
--loader hooks for require() #6
Comments
I think this is a must have. We should provide a simple enough mechanism from APMs to hook in both esm and cjs. cc @watson |
I think if it’s important enough to warrant first-class support in ESM, it’s important enough to have it in CJS. I also think the inverse is true. |
@ljharb within reason, ESM was not made with CJS interop as a first class citizen. I would prefer if something is only viable in CJS and not ESM it not be put as a priority. |
The migration path is CJS => ESM. Whatever needs to be implemented to ease this transition is important. In this case, implementing loader hooks will help moving code from ESM => CJS. Neat, but not terribly important, IMO. I would prefer using our resources elsewhere. Actually, having loader hooks only in ESM may make the transition go faster! (people will move to ESM just to have all the goodies that loader hooks will bring...) |
I do not see a short-term future where a Node.js application will not a have a commonjs module in its dependency tree. Having support for cjs loader hooks will provide a safe transition to ESM. |
@mcollina - wouldn't the existing solutions they have work with their existing CJS modules? If they do, and they should, why bother migrating to another, albeit newer and better, solution? Having said that, it would be nice to have a solution that works similarly and is configured similarly. But I would say that it's a nice to have, not a must have. If, in terms of code, it's not a big thing, then, yes. But if it requires significant work, I would prefer investing energy elsewhere. Also, could you explain what "APM" means? 😳 |
@giltayar APM = Application performance management, mostly hooking into node core to time stuff and catch errors |
There are also other common workflows like Mocking or testing that use similar workflows of replacing a module with a virtual/temporary one. |
It's absolutely essential to any adoption of ESM that things like Supporting this natively in node for ESM is a must; if we can also support it in a similar fashion for CJS, that's very valuable for a migration path - iow, if half my modules are CJS and the other half ESM, i'd want to mock out a |
@ljharb that seems fine, but various things like setters/getters on the bindings are not possible still even though CJS mocking commonly does that to properties off |
ftr, I'm totally fine with getters and setters on |
@bmeck for my own benefit, this is similar to/drawing from npm's |
@zackschuster this existed prior to theirs so idk which should be seen as drawing from which. |
@bmeck ah, i didn't know that. it sounded similar to their proposal so i wanted to make sure 😄 thanks! |
@zackschuster theirs is basically an exact replica of Node's, there are some changes to discuss around problems with the hooks and APIs. In particular Node's intentionally does not require/allow runtime mutation of the loader semantics except by the loader instrumenting it themselves. |
@bmeck ah, hence the APM references. this is making more sense to me, thank you for taking the time to explain things. |
I see two glaring omission in the recommended implementation for loader hooks
This assumes that the package boundaries will be exposed by each import statement. ESM doesn't support this functionality due to the nature of facades. Use Case 1: rxjs/operators/index.js export { flat } from './flat';
export { flatMap } from './flatMap'; rxjs/index.js export { flat, flatMap } from 'rxjs/operators'; app.js import { flat, flatMap } from 'rxjs'; Which would fire, entry, root, or operators hook; if more than one fire, then in what order? Use Case 2: At some point, ESM will have full native support in browsers and Node, eliminating the need to transpile down to ES5. Despite that fact, bundling will still be in wide use where code size and number of requests matters. Take for instance the following article: It's not uncommon for users of NPM packages to express their disdain for the overly bloated and unnecessarily deep dependency trees that come with some packages. One solution to this approach is for a lib to provide optimized production bundles that tree shake any and all unnecessary code. If/when ESM is reaches full compatibility between Node and browsers, it'll be more common to see FE workflows bleed into universal modules. Assuming a directory/package structure will remain static may work for CJS workflows but fails to take into consideration how ESM will be used. I don't really have any solutions to these problems. ESM supports some features that favor new approaches that aren't backward-compatible with CJS. From what I've seen in the front-end ecosystem, the facade design pattern provides a lot of benefits for library development (ie abstracting API from structure, hiding internals) over the traditional way of doing things (ie deep linking). It's already starting to show up in commonly used libraries. |
@evanplaice I don't really understand your first point since it is laid out depending on package boundaries, but no
This appears to be having some names that are missing in the example given. I'm going to rewrite the question as "What hooks fire when, assuming all the given files have a package boundary in the directory they are located in, and
I assume ends up resolving to
I assume this leaves the path unchanged. That ends up the first resolve and now the loader gets to loading
I assume ends up resolving to
I assume this leaves the path unchanged. That ends up the second resolve and now the loader gets to loading
If this doesn't cross a package boundary so is only handled by this loader.
If this doesn't cross a package boundary so is only handled by this loader. I'm still unclear on the question and example, but hopefully that gives a clear idea of the order of operations. For point 2, that article is satire but true. I'm not sure I understand how this relates to supporting |
I have two use cases for loader hooks that covers CJS modules and would be nice to have for ESMs. Per "Dynamic Bundling", I'd like to be able to write (parent, child) on every load to build a module dependency graph during testing which I later prune and use to generate bundle metadata. In production, I use the derived metadata to whitelist loads and check resource integrity. Right now, I'm hacking Both steps require some configuration, so it'd be nice to be able to provision a loader with flags or a pointer to a config file. I'd also like to experiment with ways to allow restricting access to modules with APIs that are prone to misuse. In other contexts, we've had success by adding static checks in linkers and build systems to enable a workflow where we carefully review and whitelist the wrapper modules to make sure they use the misuse-prone APIs correctly, then disallow direct access by other modules. I'd like to do a POC in a node context. I believe it would be sufficient to
(1) might be sufficient because there are tricks we can do with proxy objects and stack introspection but each of those have their own problems. |
@mikesamuel can you clarify
Say I have a loader that wants to do this behavior. It is given ~ Currently loaders do not automatically ever call the "parent" loader. So if this is the only loader, it could cause import to fail by:
|
If we put the loaders which are already somewhat isolated into a thread we could use a single async resolve for both module systems. The hooks would need to define what mechanism is being used to load things however so that it can switch on various differences like if the specifier is a URL component for |
I'm still trying to wrap my head around all the things that 6.1.2 lets you do. How does the lack of implicit parent loader chaining affect things? Not chaining on a vetoed resolve seems like it would fail safe, but if an error from a loader
I would be happy with either of those or even just
I think any solution needs to prevent:
I'll defer to people who better understand the loading process on whether the runtime should cache the fact that a module load failed and which solution achieves the right balance. Logs need not reflect multiple attempts to load the same module identifier. |
@mikesamuel I think the problem might be that you linked to a spec that is no longer being developed to land in browsers or node. Node hooks are minimal and documented at https://nodejs.org/dist/latest-v9.x/docs/api/esm.html |
@bmeck Sorry, I'm an idiot. That node hook doc makes no mention of "parent loaders." Should I be looking at your "RFC: Per Package Loader Hooks #18233"?
|
@mikesamuel you can look at that or nodejs/node#18914 |
Thanks. I see the loop at https://github.com/bmeck/node/blob/6b94c9ccbfe70773001260ade15a60f81d583a17/lib/internal/process/modules.js#L56-L111 It looks like each userLoader gets to specify a factory that receives the previous links in the loader chain. IIUC, earlier resolvers get to control what URL reaches later loaders, but the last loader gets to ultimately decide whether a load resolves to anything. An error in a resolver would cause line 71 to resolve to a broken promise, so would propagate through the await in the next one. So it looks like either of your options below would work.
An early user loader could defeat either by conspiring with a late user loader but my threat model treats user loaders as trusted code. |
@mikesamuel can you explain the early loader and late loader vector? Early loaders can already rewrite any loader being setup during bootstrapping. |
If an early loader can find a fixed point URL -- one that the middle loaders return unchanged -- then it can store the specifier in a side table it shares with the late loader and return the fixed point URL. The middle loaders never see the specifier in the side table. The late loader can then retrieve the value that the early loader squirreled away from the table. There side table is just one kind of shared state that serves to make the communication independent of promise resolution order. // conspiracy.js
exports const urlSideTable = new Map(); // early loader
import urlSideTable from './conspiracy.js'
let counter = 0
export async function resolve(specifier, parentModuleURL, defaultResolver) {
const substitute = `url-that-other-loaders-will-approve-${counter++}`
urlSideTable.set(substitute, specifier)
return {
url: new URL(substitute).href,
format: 'esm'
};
} // late-loader.js
import urlSideTable from './conspiracy.js'
export async function resolve(specifier, parentModuleURL, defaultResolver) {
const original = urlSideTable.get(specifier)
return /* some computation over original */
} |
@mikesamuel I've thought about completely isolating their global space before, but don't think loaders can be seen as untrusted since you need to set them via CLI flags or environment variables on startup. My suggestion is that you can have your secure loader prevent all the following loaders from sharing the global space by putting them into a |
@bmeck Agreed.
|
going to table this for now, we can come back later and reopen if desired. |
Spawned from a comment on per package hooks .
I'm opening this issue to discuss if we should make the Loader mechanisms we have expand to be able to intercept
require()
specifiers as well. I think it is doable but requires we split the resolution between sync/async forms. Looking at this as well we might want to split up the resolution of format from path resolution, but that is a lot to discuss for the general first gathering of consensus on if this should be a hook we provide.The text was updated successfully, but these errors were encountered: