-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache-by-position or reinit each time #45
Comments
Since modules don't capture variables in the current scope, you can always hoist the module outside of the callback if you want an unique one. |
The question came up in the context of the follow code sample: someButton.onclick = () => runInWorker(module { ... }); Currently, each execution creates a new module block, resulting in a new module being added to the module map in If we cached module blocks by source position, executing the same module block would return the same module block. More specifically, that would mean: const arr = [];
for(let i = 0; i < 10; i++)
arr[i] = module { /* ... */ };
await import(arr[0]) == await import(arr[1]); While this mitigates potential the performance foot gun, I would find this behavior to be strange and unexpected. In contrast, the current module block behavior (no caching) is more in line with function literals or object literals behave: runInWorker(() => {})
// vs
let h = () => {}
runInWorker(h) Most developers know that closures (or object literals) can be expensive and need to be manually hoisted in hot-paths. The same logic would apply to module blocks: const m = module { ... };
onclick = () => runInWorker(m); Additionally, one of the goals of this proposal is to be a building block for future scheduler-like libraries and platform-level proposals. Let’s pretend such a scheduler exists that implements the Actor Model, where the location of the code execution is abstracted away from the developer. If we wanted to spawn 10 actors executing the same code, caching by source position could have the unintended side-effect that multiple actors end up sharing state just because they end up on the same thread. for(let i = 0; i < 10; i++) {
spawnActor(module { /* ... */ });
} I talked to @littledan as well and we are contemplating to weakly hold module blocks in the module map, which allows them to be cleaned up throughout the lifecycle of a document. |
Would it be possible to make the return value of const cache = new Map();
function cachedImport(m) {
const result = cache.get(m) || import(m);
cache.set(m, result);
return result;
}
for (let i = 0; i < 100; ++i) {
const m = module {};
assert(await cachedImport(m) === await cachedImport(m));
assert(await import(m) !== await import(m));
} Maybe that's too weird, but I think it's nice to not have to back derive the "url" of a module block. |
I would compare it to template literals which also has some notion of cache-by-position. I'm a bit concerned about accidental memory leaks if anything keeps previously initialized module bodies alive. I'm concerned about innocent looking refactors like: // Before:
async function getStuff() {
const stuff = await import('./helper.mjs');
return stuff.default;
}
// After:
async function getStuff() {
const stuff = await import(module {
// This was a one-line helper file, this is so much simpler!
export default 42;
});
return stuff.default;
} If the return value is retained, this would now keep leaking copies of that module record even though the underlying assumption of the refactor ("import imports once") was fairly reasonable today. |
@jkrems I don't think I understand which option you are supporting 😅
|
That example to me is a compelling argument for why it must be cached by position; module exports can have identity and that identity can be important. |
Returning a primitive was a poor choice to illustrate the problem. The problem appears only really if it would be for example a closure that closes over the module scope (e.g. accesses import.meta). |
Does anyone interested in my solution to this problem? Default cache-by-position and opt-in unique module each evaluation. |
const unique = module { /**/ };
const y = [0,1].map(() => unique); I don't think there is a need to have 2 distinct ways to declare a module and keep it unique while you can store a module in a reusable name. |
@leobalter that wouldn't make it unique - ie, |
The problem here is, the default behavior currently is unique module each evaluation, which might be a footgun on memory (initialize a new module every time). |
I'm actually sleep deprived right now because of the current meeting time but let me try to rephrase how I see this: const x = [0,1].map(() => module {/* unique */}); // new module here
const same = module { /**/ };
const y = [0,1].map(() => same); // reuse the same module and yet it seems like user responsibility to control the code and something handled by a linter. This happens with other literal/expression values, so the potential memory footgun model already exists: [0,1].map(() => function() { /* a new function for each entry */ });
[0,1].map(() => [ /* a new array for each entry */ ]);
[0,1].map(() => ({ /* a new object for each entry */ })); Having a module fragment/block seems like any other literal/expression value. |
I agree if behavior is not "cache by position" but is instead "uncached", then "unique" is just "make a variable in a higher scope". However, "cache by position" seems like a reasonable default, per above discussions. |
cache-by-position seems way more confusing, IMO. function fn() {
return module {
let x = 0;
export default function() {
x++; // side effects
return x;
}
};
}
const a = fn();
const b = fn();
a === b; // ?
// now think code here evaluating and using the modules from a and b. |
In this case, with caching, This is somewhat related to the peerDependencies issue in npm module graphs. |
If they were cache-by-position would they be cached similar to tagged-templates. i.e. separate cache for each containing module instance, and the module object would be frozen? // a.js
export const moduleBlock = module {
export const v = Math.random();
};
Object.isFrozen(moduleBlock); // true // b.js
import * as A1 from "a.js?q=1";
import * as A2 from "a.js?q=2";
A1.moduleBlock !== A2.moduleBlock; // true |
@acutmore module namespace objects are already frozen ( |
I meant the module block object, rather than the value you get when importing the module. EDIT: right now in the draft spec the object is an Ordinary unfrozen object with ModuleBlock.prototype as its [[proto]] |
I suppose as long as the actual module stuff is in a slot, it doesn't matter much, but it'd have to be frozen if it were source-position-cached. |
That module factory function resembles something I'd write under the current reinit-on-execute semantics, after understanding how it works and utilising it. But it doesn't look like something I'd accidentally encounter under cache-by-position semantics while expecting the opposite, at far as that example illustrates at least. @jkrems's example #45 (comment) in favour of cache-by-position seems more compelling. |
I think the biggest problem is that with cache by position, there is no way to generate multiple module definitions from the same source (without |
@mhofman making a new module block that re-exports everything from the old one should suffice? |
That new module block needs a new position in source, that wouldn't be very dynamic? |
ah, i see what you mean, fair point. |
I’m assuming that a module block expression produces an object. I’m not deeply familiar with the precedent established by template literals, but I would be surprised if the module object were frozen. However, if evaluating a module block every produces the same object, it really must be frozen. I agree it would be reasonable for every evaluation of a module block to produce a unique object but to retain a cached and immutable representation of the underlying module text in an internal slot. This would allow for transport optimization. Let’s call this option “mutable module objects”. If I were convinced that “mutable module objects” were untenable, I’d fall back to “immutable module objects”, where the every evaluation of a module block expression produces an identical but frozen object. For the compartments proposal, this would imply that |
Yeah there was the separate point of which default would be less confusing, cloning is of course attainable either way. Why not a |
I think the opt-in |
@nayeemrmn this is interesting, with some caveats. I see precedent for a module literal expression making new modules as functions and other literals do, so there is where I hang my argument to insist in new modules with no side effects hazards when something like it is used. As @mhofman mentioned, it would be the only way to create multiple modules with the same source text without eval. In parallel, It's hard for me to understand why someone would want the same module in code that evaluates a module expression such as: const x = [0,1].map(() => module {/* unique */}); Maybe I understand a function encapsulating some module code without the need to observing anything external such as: // some bundled code
async function getData(key) {
const m = module { /* data code in here */ }
const data = await import(m);
return data[key];
} The code above would highly benefit of caching-by-position but there are alternatives to overcome that. I need to think more about this but I wonder what this would mean for bundling. |
I guess another way to look at it is, |
I don't think it solves @domenic's concern, because you can still easily span a ton of workers (even if they share the same module object). |
It would still be possible without direct use of Do we have any use cases for creating multiple identical module blocks? In the situations I’m imagining it would be more useful to control the freshness at the site where I’m doing the dynamic import.
|
But classes and functions can be GC when they're not being referenced. The module is different...? Maybe an anonymous module with no reference can be GC too? |
A module can be GCed when it's not referenced, but only if it's not cached by position. function getModule() {
return module {};
}
const ws = new WeakSet();
let x = getModule();
ws.add(x);
x = null;
// after some time...
console.log(ws.has(getModule())); If it's cached by position, the last call must return true and thus the module cannot be GCed. |
it could be GC'd when |
Let me summarize my current position:
Overall, I am strongly leaning towards not caching by source position. |
Another point in favor of not caching:
function getModules(len) {
let modules = [];
for (let i = 0; i < len; i++) {
modules.push(module {});
}
return modules;
}
// vs
function getModules(len) {
let modules = [];
let mod = module {};
for (let i = 0; i < len; i++) {
modules.push(mod);
}
return modules;
}
// vs
let mod = module {};
function getModules(len) {
let modules = [];
for (let i = 0; i < len; i++) {
modules.push(mod);
}
return modules;
} |
There is a fundamental reason that module blocks cannot be cached at all, assuming we eventually have a module loader API and assuming we can communicate with that module loader API in a separate process. For the sake of argument, I assume the Compartment proposal is the module loader API. Being able to dynamically import a module block implies that the module loader API would support This assumption is grounded in the XS Compartment implementation which allows a parent Compartment to communicate static module records to a child Compartment with different module specifiers. new Compartment(null, {
"your.js": "my.js"
}) This is important because it allows XS to precompile all of the modules in the program’s working set and reveal them to child compartments where they may be linked according to their import maps relative to the referrer specifiers they were given. XS Precompilation is analogous to bundling, so I expect similar patterns will be relevant in contexts beyond embedded systems. And, import maps also lend this kind of flexibility. When a static module record transits from one worker to another, and the workers have different import maps, the import specifiers may link to different modules in the receiving module graph than they would have in the sender’s. So, with invocations like There is a problem with this idea though. Since a static module record doesn’t carry the referrer module specifier, there is no vessel to carry And, if “module”, is an envelope with a referrer module specifier and a static module record, it would be enticing to assume that the referrer module specifier is a cache key. To be a suitable cache key, it would have to include a content address hash of the text of the block, or transmission of a module block would have to include the whole text of the surrounding module and the referrer would need to contain the position. Thinking in this direction is problematic because there would be an integrity issue: the sender could override the module that would have been loaded by the recipient for the given referrer module specifier. In short, I’m strongly in favor of not caching module blocks. I’m also in support of module blocks corresponding to static module records. I’m also in support of the receiving compartment governing the referrer module specifier for the block, but that is irreconcilable with my desire for module blocks to correspond to static module records. The only course out of this dilemma that I can see is for the module block to produce an envelope with the static module record and the referrer module specifier, for the initialization of the block to be ephemeral and not have a cache key, but use the referrer module specifier only for the purpose of resolving import specifiers of the module block, using the resolver math or importmap of the receiver. |
If there is no way to refer to the module, it actually doesn't observable so the engine can GC them. (This requires module blocks to not have a unique identifier to reference, e.g. via
For anyone don't know that behavior (including me), the following code is an example of cache-by-position. (Though I don't know why it is designed like this). function id(x) {
return x;
}
function templateObjectFactory() {
return id`hello world`;
}
let result = templateObjectFactory() === templateObjectFactory();
// true
Yes, it seems like the current argument can convince me that the current behavior is good enough. (It will be better if anymous module block instances can be GCed). |
I'm not sure why the performance footgun should be the default behavior? If These should absolute be cache-by-location. If you want the slow-do-all-module-evaluation-again way, mark it with a clone call. |
Creating anything in a loop is an antipattern; I'm not sure we need to worry about that. |
One of the use cases is to add an event listener that sends off to a worker thread to compute. A |
The way you'd solve that with |
The entire point of the mod-function thread is that the normal syntax is too heavy for inline use. It's core use is to not hoist. |
I’m pointing out a nit, because I wonder if some other folks might get this distinction wrong:
|
when you have a module block, you always want to instantiate it to make it useful, no matter using import call, worker.addModule or any other API, so these two things I think are identical |
This is my impression as well. It doesn't matter if To illustrate the way I see this, imagine the following code: function make() {
return module { … }
}
const a = make();
const b = make(); It doesn't really matter to me if So given |
You're describing the same problem if you create a new object or function inside |
There's a big difference between a module block and other things folks might use in a loop like a function. Function instances have no state and are very cheap to create, the actual function code is only created once. That's not the same thing as running all the global initialization within a module when you do import(). The other big difference is that you don't transform a class or function into another thing to use it, while a module does get transformed by way of import(). I think the most important thing here is that you can create a cache within libraries by having some kind of stable identifier for the thing returned by the module block. So like: fn = () => module {}
fn() !== fn() // this is fine
await import(fn()) !== await import (fn()) // also fine but somehow libraries need a way to cache passing in the exact same module text so they can make runTask(module { ... }) not resolve the module state repeatedly. Similarly if I can postMessage a module to another thread for evaluation, the other side needs a way to avoid doing import() by way of some kine of caching. That can't be fixed by loop hoisting, since the ModuleBlock is cloned when it gets structured cloned. |
I'd like to understand this assumption. Declaring a closures in loops capture a lot more state (parent contexts) than a module by itself would. And importing a module declaration doesn't have to be particularly more expansive than calling a function, especially if there are no dependencies like there would be for
I believe this is a separate topic discussed in #58. The question is whether a single module declaration results in a single module namespace object when imported, regardless of the identity of the "pointer" to the module declaration, or how it got to the importing realm. This is particularly a problem for realms in different agents. |
How much does hot-module-reloading feed into this? I can imagine it might be nice to be able to transpile a module block in a way that allows it to be replaced without reloading the outer container module, for when the code change only happened inside the block. |
I think it's not the topic of this thread. For HMR, I think the bundler can generate a "forward" module so it can keep the identity stable. |
I like the idea of the |
I am of the opinion that we should go with a cache-by-position semantic here. I have multiple reasons: Folks are already familiar with the semantic of each bit of source text belonging to only one module, and each module only being executed once. If the same "module" in source text could result in multiple instances of a module existing in a module map, the current understanding that each bit of source text in the global scope in a module is only evaluated once per realm would be violated. This may be confusing to users. I don't see a compelling use-case for non-cached modules. Because it is quite questionable how well module blocks, and module instances created from module blocks can and will be garbage collected, we should discourage using modules as stateful actors at all costs. Instead if folks want stateful actors, they should In combination with the inability for module blocks to close over scope, I think there are no compelling use-cases where one would want to create multiple instances of a module block with the same source text in a single realm. We are already getting a proper way of explicitly creating multiple module maps per agent (using |
Possible solution:
Add a
unique
(prior artsymbol
vsunique symbol
in typescript) modifier to tell which the semantics should be use.The text was updated successfully, but these errors were encountered: