Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imported string constant namespace #32

Closed
eqrion opened this issue Jun 5, 2024 · 19 comments
Closed

Imported string constant namespace #32

eqrion opened this issue Jun 5, 2024 · 19 comments

Comments

@eqrion
Copy link
Collaborator

eqrion commented Jun 5, 2024

From the in-person CG meeting today, there was some push back on the choice of "'" as the namespace of string constants. Opening this as a placeholder to not forget about this.

@michaelficarra
Copy link
Contributor

I would prefer to set a precedent of consistently using a wasm: prefix for built-in modules (and similar) with something like wasm:js-string-constant if we can get away with it.

@rossberg
Copy link
Member

rossberg commented Jun 6, 2024

Wouldn't "js:" be more appropriate? This is a JS interop library, not a generic Wasm thing.

@michaelficarra
Copy link
Contributor

@rossberg The prefix is not related to the feature, it's related to the governing authority. In this case, the authority defining the module specifiers prefixed by wasm: is the wasm CG, so the name seems appropriate.

@rossberg
Copy link
Member

rossberg commented Jun 6, 2024

Fair enough, but I would make the argument that these authorities being coupled is merely a historical accident. In principle (and perhaps in practice, in some future), we could totally decouple the Wasm standardisation body from the body that standardises Wasm-JS interop. So perhaps this shouldn't be conflated.

@jakobkummerow
Copy link
Contributor

I don't care much what the name (or encoding in general, for that matter) is, since it is usually both produced and consumed by tools, not by humans.

It is important, though, that we don't add too much module size overhead. Import-based string constants already cause an unfortunate size regression compared to a built-in string.const instruction (see issue #13 for details; summary: expect modules to have tens of thousands of string constants; aside from their own size they also shift other non-imported frequently-referenced globals into multi-byte LEB range). If we specify a way to not have to repeat the "module name" for every single import in the binary format, then choosing a name like wasm:js-string-constants or similar is fine. As long as the module name does need to be repeated for every import, we should go with a one-character name (or even zero-char, but that presumably runs a somewhat higher risk of colliding with existing size-saving conventions). ' or " seem fitting for strings, between these two ' has the advantage that you can wrap it in double-quotes in the text format ("'") without needing backslash escaping, but from a technical perspective any one-byte encodable UTF-8 character (including unprintable ASCII control characters) would work just as well.

@eqrion
Copy link
Collaborator Author

eqrion commented Jun 11, 2024

Fair enough, but I would make the argument that these authorities being coupled is merely a historical accident. In principle (and perhaps in practice, in some future), we could totally decouple the Wasm standardisation body from the body that standardises Wasm-JS interop. So perhaps this shouldn't be conflated.

Yeah, that has happened in practice before. ArrayBuffer's were a WebGL thing before a EcmaScript took it, if I recall correctly. This was an open issue when TC39 was more actively thinking about having their own builtin modules. TBH, I was sort of trying to skate around this with a vague 'wasm' namespace that could refer to the governing body or just to 'wasm-builtins', but some more thought here might be warranted.

Using 'js' or a 'wasm-js' namespace is also a bit tricky, because some interfaces we'd want to adapt aren't strictly speaking JS interfaces. The TextEncoder/TextDecoder interfaces are HTML API's defined in WebIDL, and not available on all JS engines.

A separate issue for this would probably be good.

@eqrion
Copy link
Collaborator Author

eqrion commented Jun 11, 2024

I don't care much what the name (or encoding in general, for that matter) is, since it is usually both produced and consumed by tools, not by humans.

It is important, though, that we don't add too much module size overhead. Import-based string constants already cause an unfortunate size regression compared to a built-in string.const instruction (see issue #13 for details; summary: expect modules to have tens of thousands of string constants; aside from their own size they also shift other non-imported frequently-referenced globals into multi-byte LEB range). If we specify a way to not have to repeat the "module name" for every single import in the binary format, then choosing a name like wasm:js-string-constants or similar is fine. As long as the module name does need to be repeated for every import, we should go with a one-character name (or even zero-char, but that presumably runs a somewhat higher risk of colliding with existing size-saving conventions). ' or " seem fitting for strings, between these two ' has the advantage that you can wrap it in double-quotes in the text format ("'") without needing backslash escaping, but from a technical perspective any one-byte encodable UTF-8 character (including unprintable ASCII control characters) would work just as well.

I'm going to present a compact import section proposal soon that would let us only specify the 'module' field once. Pragmatically speaking though, that won't be available before we want this proposal available.

How would you feel about making the string constant namespace configurable?

Something like: WebAssembly.compile(..., {importedStringConstants: "namespace goes here"});?

This would let users use any short constant they want for now, and then do something longer in the future if we have a compact import section.

@michaelficarra
Copy link
Contributor

@eqrion That is not a bad solution, but there will still need to be a default module specifier used when importing through the esm integration.

@jakobkummerow
Copy link
Contributor

How would you feel about making the string constant namespace configurable?

That idea came up before, and I've already prototyped it.
I can see that when considering just the API surface, it looks pretty.

But it has significant complexity cost in the implementation (go ahead and prototype it yourself, you'll see what I mean: arbitrary-length dynamic strings are much more complicated to handle than a single hard-coded ASCII char code constant; I spent about two full-time days on the patch).
And it has measurable performance cost, so I don't think any real-world deployments would actually make use of this feature, because who wouldn't want to get some performance for free if all it takes is choosing an efficient name for your tool-generated and tool-consumed magic imports section?

For 20K string constants with payloads "string0", "string1" etc in an otherwise empty module, I've measured:

  • 2% slower instantiation when the chosen module name is "'" (24 -> 24.5 ms).
  • 12% slower instantiation when the chosen module name is "string-constants" (24 -> 27 ms).

That overall cost would certainly be acceptable for something important, but for a purely aesthetic change (in an area that is rarely ever exposed to human eyes!), I'm very far from convinced that it's worth it.

@fitzgen
Copy link
Contributor

fitzgen commented Jun 12, 2024

@jakobkummerow did you happen to measure the overhead of a larger static string for the import? I'm curious how much overhead is introduced by the dynamicism vs the increased cache pressure. I guess it seems like maybe ~10% is cache pressure given that the numbers went from 2% to 12% when the length increased?

That is fairly compelling justification for either keeping it small or developing the compact import section format.


(I had previously been thinking that uncompressed code size didn't really matter too much, because we can always expect gzip/brotli/etc over the wire. However, things like excessive code size leading to cache pressure and potentially degrading performance during decoding, increasing peak RSS, etc... mean that uncompressed code size is generally important. Just noting this down for posterity since I think there might be other folks with the same notion I originally had.)

@jakobkummerow
Copy link
Contributor

@fitzgen No, I haven't measured that, because I assumed that for module size reasons it wouldn't be a viable option anyway. I would guess that the performance would be very similar to the longer user-chosen string; implementation complexity would be a lot less though (that's where the dynamicism really hurts).

@eqrion
Copy link
Collaborator Author

eqrion commented Jun 14, 2024

How would you feel about making the string constant namespace configurable?

That idea came up before, and I've already prototyped it. I can see that when considering just the API surface, it looks pretty.

But it has significant complexity cost in the implementation (go ahead and prototype it yourself, you'll see what I mean: arbitrary-length dynamic strings are much more complicated to handle than a single hard-coded ASCII char code constant; I spent about two full-time days on the patch). And it has measurable performance cost, so I don't think any real-world deployments would actually make use of this feature, because who wouldn't want to get some performance for free if all it takes is choosing an efficient name for your tool-generated and tool-consumed magic imports section?

For 20K string constants with payloads "string0", "string1" etc in an otherwise empty module, I've measured:

* 2% slower instantiation when the chosen module name is `"'"` (24 -> 24.5 ms).

* 12% slower instantiation when the chosen module name is `"string-constants"` (24 -> 27 ms).

That overall cost would certainly be acceptable for something important, but for a purely aesthetic change (in an area that is rarely ever exposed to human eyes!), I'm very far from convinced that it's worth it.

I just prototyped this out today. I think this probably depends on what pre-existing infrastructure you have in your code base, but it really wasn't hard to implement in SpiderMonkey (we had utilities that made it easy). I also haven't observed a perf difference outside of noise for the dynamically chosen single character case you described. I haven't tested the larger chosen name case, because I wouldn't be surprised if there's a regression.

However, I'm still not sure it's worth doing if no one will actually use a different name. I'd like for us to have a compact import section in the near future, but even with that I don't think any tools will switch to actually use the larger name.

@tlively
Copy link
Member

tlively commented Jun 14, 2024

I doubt any tool will switch to a longer name, but they very well might switch to using the empty string if they know it is not already being used for something else.

@eqrion
Copy link
Collaborator Author

eqrion commented Jun 17, 2024

I doubt any tool will switch to a longer name, but they very well might switch to using the empty string if they know it is not already being used for something else.

I suppose that could be enough of a reason. @jakobkummerow are you okay with letting the namespace being customized through a flag? If we do this, it probably makes sense to have the default being something verbose and expecting everyone to just use the empty string unless we have some optimized imported section. This seems like an overall improvement by letting the empty string be used widely.

@jakobkummerow
Copy link
Contributor

I don't think the extra complexity is justified, but if y'all agree that you want this, fine, it can be implemented.

What "default" are you talking about? In {importedStringConstants: "your-choice-goes-here"}, where does a default come into play?

@eqrion
Copy link
Collaborator Author

eqrion commented Jun 18, 2024

I don't think the extra complexity is justified, but if y'all agree that you want this, fine, it can be implemented.

What "default" are you talking about? In {importedStringConstants: "your-choice-goes-here"}, where does a default come into play?

I had mentally assumed you could still do a {importedStringConstants: true} and receive the default namespace. That might not be worth it though. It does sounds like ESM integration would need a default, but that might be something that could be deferred to that proposal.

@jakobkummerow
Copy link
Contributor

I had mentally assumed you could still do a {importedStringConstants: true} and receive the default namespace.

JS being what it is, I would assume that a to-string conversion would be performed on the value, so you would get "true" as the namespace in that case.

@michaelficarra
Copy link
Contributor

@jakobkummerow I know you probably weren't being serious, but in case you were: modern JavaScript does not do this kind of coercion. The committee has committed to a number of normative conventions that guide feature development, which includes a commitment not to implicitly convert to any expected type other than Boolean. You can read more here: https://github.com/tc39/how-we-work/blob/main/normative-conventions.md#avoid-coercing-arguments-to-types-other-than-boolean

@eqrion
Copy link
Collaborator Author

eqrion commented Jul 11, 2024

The string constant namespace is now required to always be chosen by the user. See #29. Closing this issue.

@eqrion eqrion closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants