String constants #13

eqrion · 2023-12-06T15:15:48Z

Here's another idea for doing string constants that uses a table instead of globals. This can avoid the implementation limit of 1,000,000 globals we have, and possibly be more efficient.

Wasm imports a table of externref with minimum size of the total number of string constants
JS code creates the table using JS-API constructor
JS code creates an array of JS strings (from an encoded JSON array if beneficial)
JS code uses a new Table.copyFrom(iterable, startIndex) method to initialize the table, passing it the array it loaded
JS code instantiates the module and passes it the table
Wasm code uses a table.get to get the strings it wants. (minimum size can avoid bounds checks)

Table.copyFrom() would be a new JS-API method on tables that takes a JS iterable object and copies it's values into itself starting at a specific index.

A variation of this could also be that the Table JS-API constructor takes an iterable and uses that for the initial values of objects. This could possibly allow the table type to be (ref extern).

One problem is that toolchains also want to refer to string constants in global initializers when constructing static objects. Another extension could be to allow table.get as a constant expression (table section comes before global section). This might be a little tricky, because table.get can trap if the index is out of bounds. At least for SM, we already have to handle instantiation failure due to OOMs in initializer expressions, so this may not be that problematic?

@jakobkummerow WDYT? cc @rossberg if they have any opinions about allowing table.get as a constant expression in global initializers.

The text was updated successfully, but these errors were encountered:

jakobkummerow · 2023-12-06T20:18:02Z

Sure, that could work. I find it hard to estimate whether it would be more efficient than using globals directly, but it would certainly have a higher limit on the number of strings it can support.

Speaking of which: the relevant limit for the status quo is 100K imports, not 1M globals.

I think requiring a bunch of JS code to run before the Wasm module can be instantiated is Not Great. That's the bullet that all imports-based approaches to string contents have to bite.

I think making table.get (for immutable imported tables) a constant instruction is probably a hard requirement for making this approach viable. If a module doesn't need strings in constant initializers, it can simply use array.new_data + String.fromWtf8Array (which engines can shortcut to an internal "string from data segment" operation, and which toolchains can support easily because everything is in the Wasm module).

eqrion · 2024-01-19T22:43:05Z

Speaking of which: the relevant limit for the status quo is 100K imports, not 1M globals.

Ah yes, I felt like it was lower than 1M but couldn't find a reason why.

I think making table.get (for immutable imported tables) a constant instruction is probably a hard requirement for making this approach viable.

From some discussion with @rossberg, I realized this would require us to have immutable (and probably also non-growable) tables, which would be substantial additions. I'm not sure if they're worth adding to the wasm language.

A different option to avoid any significant core wasm change would be to:

Import a (global $strtab (ref (array (ref extern))))
Add array.get for immutable arrays as a constant expression
Add a WebAssembly.Array.fromIterable(heapType, iterable) which could be used to take a JS array of strings and turn it into an immutable wasm array

If a module doesn't need strings in constant initializers, it can simply use array.new_data + String.fromWtf8Array (which engines can shortcut to an internal "string from data segment" operation, and which toolchains can support easily because everything is in the Wasm module).

I'd really prefer to avoid adding an optimization like this to our baseline compiler (and I assume interpreter if we had one too). We'd need to add some way of tracking a deferred array.new_data computation in our abstract evaluation stack, and that leaks pretty far in the compiler.

jakobkummerow · 2024-01-22T11:38:57Z

I'd really prefer to avoid adding an optimization like this to our baseline compiler

Sure, we don't have it in our baseline compiler either. It's a functionally unobservable and hence strictly optional internal optimization, appropriate mostly for optimizing compilers.
The biggest issue with it is that recent toolchain evolution suggests that most or even all modules will need most or even all of their strings to be available in constant initializers, so there doesn't seem to be much problem left for this solution to solve.

eqrion · 2024-01-23T20:48:03Z

I'd really prefer to avoid adding an optimization like this to our baseline compiler

Sure, we don't have it in our baseline compiler either. It's a functionally unobservable and hence strictly optional internal optimization, appropriate mostly for optimizing compilers. The biggest issue with it is that recent toolchain evolution suggests that most or even all modules will need most or even all of their strings to be available in constant initializers, so there doesn't seem to be much problem left for this solution to solve.

Just so I understand correctly, are you saying that your recent toolchain experience indicates we'll need something more than globals? If so, should I push on the immutable array idea or do you have another idea?

jakobkummerow · 2024-01-23T23:18:16Z

I'm saying that toolchains want to be able to use strings in globals. It's less about the character sequences themselves, it's more about things that transitively depend on them, such as java.lang.String wrapper objects, and ultimately class metadata and whatnot that has properties with (wrapped) string values.

Having a constant string.const instruction makes that easy and efficient. Globals that refer to strings can be initialized with initializers, so they can be immutable and non-nullable, which is great for optimizing functions that use them. When J2Wasm switched to this scheme (from their previous approach of initializing globals lazily, which sprinkled "`(if (global.get $g == null) (global.set $g (...)))" checks all over the module), they saw massive improvements to module size and performance (double-digit percentage, IIRC, but I don't have exact numbers on hand).
Since the actual act of executing constant global initialization is unobservable, this approach also gives engines freedom to potentially find ways to hide some of the work (on another thread, or in an idle time slot, or by being (partially?) lazy). FWIW, we aren't doing any of that in V8 yet, but I think having the option was nice.

In the absence of a string.const instruction, one way to get string constants is to use data segments and array.new_data and the string-from-array functions. Since the latter are imported, they aren't "constant", and spec'ing constant imported functions is probably infeasible, so global initializers are out with this approach. A possible workaround is to initialize all globals via the start function (or, equivalently, some other function that's called before anything else happens). The entire transitive hull of globals that indirectly refer to strings (which probably means: approximately all globals) then needs to be mutable and nullable though. (It's possible that this mutability/nullability could be limited to certain properties; it's unclear to me how feasible that is and to what extent it would even help.)
Replacing an eager start function with user-space lazy initialization is of course technically possible, but practically unattractive due to its code size and performance cost (see above).

The other way is to import the strings themselves as immutable globals. That maintains the ability to use global initializers for string-using globals and have them be immutable+nonnullable. The drawbacks are that (1) all strings must be eagerly created in JS before the Wasm module can be instantiated (so, again, no laziness; and to make it even worse, that eager code on the critical path is one-shot JavaScript), (2) the number of imports to process increases from hundreds to tens of thousands (and we may even have to raise the limit on allowed imports when sufficiently large apps start running into it), (3) toolchain support to create/optimize/manipulate these JS+Wasm bundles is yet to be implemented (and affects everything: e.g. merging modules will also have to merge their JS-provided string constants; dead-code elimination has to rewrite both files; etc).

I suspect that we will have to prototype both of these approaches in toolchains in order to determine which one is more viable (or even: whether either of them is viable at all).

I don't see how tables or immutable arrays would help much, as they don't change the fundamental equation; the only clear benefit they bring is reducing the number of individual imports (addressing drawback (2) above); but it sounds to me like the price they pay for that is that they require even more eager work to be done up front (the respective fromIterable operation). If your intuition says otherwise, I encourage you to prove it with an A/B benchmarkable pair of prototypes :-)

eqrion · 2024-01-31T22:34:26Z

Having a constant string.const instruction makes that easy and efficient. Globals that refer to strings can be initialized with initializers, so they can be immutable and non-nullable, which is great for optimizing functions that use them

Adding stringref, string.const and a string-constant-section to core wasm to improve string creation performance is definitely a possibility. However, it's a big addition that impacts systems like embedded that may not even have a host string type. It would be very convenient to the advancement of this proposal to find a way to avoid this. That's why I'm trying to find other smaller tweaks we could pursue. If nothing works here, then we could consider something like this.

In the absence of a string.const instruction, one way to get string constants is to use data segments and array.new_data and the string-from-array functions. Since the latter are imported, they aren't "constant", and spec'ing constant imported functions is probably infeasible, so global initializers are out with this approach. A possible workaround is to initialize all globals via the start function (or, equivalently, some other function that's called before anything else happens). The entire transitive hull of globals that indirectly refer to strings (which probably means: approximately all globals) then needs to be mutable and nullable though. (It's possible that this mutability/nullability could be limited to certain properties; it's unclear to me how feasible that is and to what extent it would even help.)

So if I read this correctly, this option is to make all (or most) globals mutable, initialize them at start, and use the 'string from an array from a data segment initialization path'. This definitely has the advantage of reducing the global import cost, but I'm unsure about the cost of the string creation. Like I said above, we really don't want to add that macro optimization in our baseline compiler and the start function is one-shot Wasm and a great candidate to be lazy and not use the optimizer on. The extra copy that would then happen seems unfortunate.

The other way is to import the strings themselves as immutable globals. That maintains the ability to use global initializers for string-using globals and have them be immutable+nonnullable. The drawbacks are that (1) all strings must be eagerly created in JS before the Wasm module can be instantiated (so, again, no laziness; and to make it even worse, that eager code on the critical path is one-shot JavaScript), (2) the number of imports to process increases from hundreds to tens of thousands (and we may even have to raise the limit on allowed imports when sufficiently large apps start running into it), (3) toolchain support to create/optimize/manipulate these JS+Wasm bundles is yet to be implemented (and affects everything: e.g. merging modules will also have to merge their JS-provided string constants; dead-code elimination has to rewrite both files; etc).

Agreed, importing each string as an immutable global seems like the per-global import cost would be prohibitive. That's why I was trying to think through an 'Option 3' where a module only imports a single immutable array which contains all the strings in the module, then the module's other global initializers could refer to that string array using a constant array.get operation.

Regarding 'laziness' of creating strings, what are you imagining? At least in SM in JS, we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?

jakobkummerow · 2024-02-01T11:04:44Z

it's a big addition

It's a tiny addition compared to GC. I don't expect any system that doesn't implement GC to implement GC'ed strings.

The extra copy that would then happen seems unfortunate.

Agreed. All options aside from a string.const instruction incur unfortunate extra work.

we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?

V8 does that too: for string constants in JS source, it eagerly creates interned strings. So there's certainly a chance that we may, at least for now, get away with Wasm being like JS in this regard. I generally think it's desirable to design Wasm such that it can perform better than JS; otherwise what's the point of compiling to Wasm instead of compiling to JS?

eqrion · 2024-02-01T13:35:09Z

it's a big addition

It's a tiny addition compared to GC. I don't expect any system that doesn't implement GC to implement GC'ed strings.

Compared to GC sure, compared to this proposal it's a big addition to cross into core wasm with a new valtype/section and raises the bar of general usefulness quite a bit. I would like to see this proposal move as fast as possible and avoid getting stuck in big core wasm discussions. Again, not ruling it out, but it seems expedient to look at other extensions first.

we create the actual string objects when parsing the source (as far as I can tell). Does V8 do something differently?

V8 does that too: for string constants in JS source, it eagerly creates interned strings. So there's certainly a chance that we may, at least for now, get away with Wasm being like JS in this regard. I generally think it's desirable to design Wasm such that it can perform better than JS; otherwise what's the point of compiling to Wasm instead of compiling to JS?

My goal with this proposal is for wasm to have parity (or very very close) with JS when it comes to JS strings. It'd be nice to have it perform better on JS strings, but that's not a hard requirement.

eqrion · 2024-02-29T23:18:03Z

I finally got around to prototyping and benchmarking this (source here).

I tested out four strategies:

"imported globals" - Define strings in JSON embedded in JS, import through globals
"data segments" - Define strings in a data segment, use array.new_data and JS string builtin to create the string and store in a private global
"array from iterable" - Define strings in JSON embedded in JS, convert to immutable wasm array through new JS-API method, import the array
"string section" - add a binary string section, using UTF-8

All of these for a list of 100,000 random (generally short) strings.

Testing on my machines I got the following:

import_global = 45.41 ms (variance = 448.47)
data_segment = 232.70 ms (variance = 9.95)
array_from_iterable = 13.76 ms (variance = 0.02)
string_section = 13.28 ms (variance = 0.07)

These numbers are for the full script to parse and execute and include synchronous wasm compilation and instantiation (using our baseline compiler).

For uncompressed file sizes, I get:

import_global = 2.1 MiB
data_segment = 5.5 MiB
array_from_iterable = .8 MiB
string_section = .6 MiB

The data segments one is an outlier in runtime, and that's mostly due to the compile time for the big 'init' function which does array.new_data -> stringFromCharCodeArray -> global.set sequence for each string. That could be mitigated with asynchronous compilation and caching, but even with our baseline took a long time. The file size is also pretty bad. I could possibly optimize it further by storing the final strings in a private table instead of globals.

When it comes to the string section, it doesn't seem to buy anything for runtime. I took the strategy of allocating the strings and storing them on the instance upon instantiation. It does seem like the string section is more compact, but I haven't measured compressed sizes so I don't know.

As of right now, it seems like array_from_iterable is the sweet spot of minimal spec complexity and performance.

kripken · 2024-02-29T23:55:43Z

Thanks for the data @eqrion !

I think we may want to add something to the code size and startup time for "array from iterable" to make the comparison fully apples-to-apples. Atm that binary is

(module
 (type $strings (array externref))
 (import "" "strings" (global $gimport$0 (ref $strings)))
)

which means we have an array of strings. To access an individual string we'd need global.get + array.get + i32.const. In comparison, "imported globals" has

 (import "str" "0" (global $gimport$0 externref))
 (import "str" "1" (global $gimport$1 externref))
 (import "str" "2" (global $gimport$2 externref))
...

which means the strings are already separated, and accessing an individual string costs only a global.get. That is, some of the extra code size and startup time in "imported globals" is work that "array from iterable" will need to do as well (in order to actually use individual strings), but does not seem to be accounted for here. I'd guess at least the size of array.get + i32.const times the number of strings, but perhaps a little more as if a string is used more than once we'd likely want to cache it in a global.

Apologies if I have misunderstood something in the code or explanation, which is definitely possible!

eqrion · 2024-03-01T01:15:12Z

Thanks for the data @eqrion !

I think we may want to add something to the code size and startup time for "array from iterable" to make the comparison fully apples-to-apples. Atm that binary is

Yeah, this is where comparing things gets hard. I was expecting that a toolchain using the 'array from iterable' approach would skip creating a global per string and do the array access scheme you mentioned. It's at the cost of a bounds check and indirection per use of the constant, but it avoids the cost of creating/initializing the globals.

That is, some of the extra code size and startup time in "imported globals" is work that "array from iterable" will need to do as well (in order to actually use individual strings), but does not seem to be accounted for here

The extra code size for 'imported globals' (compared to 'array from iterable') is from the import statements (otherwise they're pretty similar), and the extra startup time comes from imports not scaling well (requires a full JS property lookup per import). I don't think think those are things that 'array from iterable' will necessarily need to do (although I may be wrong and accessing from global instead of array may be required).

I guess I could add a separately timed stage to the benchmarks that accesses each string constant once to capture the cost that each scheme implies? e.g. global.get vs array.get vs string.const. Would that be what you're interested in?

kripken · 2024-03-01T19:04:48Z

Yeah, this is where comparing things gets hard. I was expecting that a toolchain using the 'array from iterable' approach would skip creating a global per string and do the array access scheme you mentioned. It's at the cost of a bounds check and indirection per use of the constant, but it avoids the cost of creating/initializing the globals.

Definitely, yeah, there are multiple possible tradeoffs here, and comparisons get hard.

Though my concern about "array from iterable" isn't just runtime speed but also size - each use of a string gets larger. Enough appearances of a particular string constant and the binary could be bigger, maybe even big enough to offset any startup benefit to not creating those globals. But fwiw I did a measurement on a large Java testcase I have and most strings appear only once, so at least there "array from iterable" would be smaller.

I guess I could add a separately timed stage to the benchmarks that accesses each string constant once to capture the cost that each scheme implies? e.g. global.get vs array.get vs string.const. Would that be what you're interested in?

Yes, I think that's worth doing. Though maybe be careful not to put it all in a single big function - my worry is that this is one of those costs that gets "spread" around an entire codebase, while in a single big function the VM might do things like store the array in a local after the first load which would hide the true global.get cost on real-world code.

Also an option is to involve Java/Dart/Kotlin people in this discussion to get a more realistic picture of how they use strings atm. But just to try to present the point of view I've seen from the Java side in my work with them, we've worked very hard together (J2Wasm+Binaryen) to avoid runtime overheads that get spread around the codebase, by focusing on tweaking things like how itable accesses are done and trying to avoid every possible load there. So even an extra array.get per string access sounds risky to me. As background, in the Sheets use case mentioned in the V8 WasmGC blogpost, startup definitely matters, but I believe performance afterwards matters a lot more, so one global per string is likely the way to go there at least.

Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an i32.const byte by using an immediate, and also there would be no bounds check.

eqrion · 2024-03-06T15:32:27Z

Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an i32.const byte by using an immediate, and also there would be no bounds check.

That would be a nice solution, but I think we have an implementation limit of 10k struct fields which from the above discussion seems too small for string constants. Part of the motivation for using arrays instead of globals was the 100k import limit.

I'll try to get some data on the cost of accessing each string constant once for all of the different approaches.

eqrion · 2024-03-06T15:36:16Z

Btw, another option might be "struct from iterable", just replacing the array with a struct. We do know the number of fields statically per compilation, so that is possible, and while it would add some type section size, it would then avoid an i32.const byte by using an immediate, and also there would be no bounds check.

That would be a nice solution, but I think we have an implementation limit of 10k struct fields which from the above discussion seems too small for string constants. Part of the motivation for using arrays instead of globals was the 100k import limit.

I suppose if toolchains were open to splitting their string data into multiple structs as needed, it would scale well. It would be some more complexity for toolchains though. We also probably could increase the struct field size limit to 100k with a concrete use-case.

eqrion · 2024-03-13T20:04:09Z

Here's some quick data on adding a single use for every string constant to compare overheads. I did put all of the uses into a single function. However, I'm only testing with our baseline compiler so there will not be any optimizations that re-use array pointers, bounds checks, dead code elimination, etc. It should be basically equivalent to if they were spread out in different functions.

import_globals: 60.2ms -> 77.9ms
data_segments: 225ms -> 223.4ms (within noise during 200 iterations)
import_array: 13.7ms -> 33.9ms

The before/after numbers show the effect of compiling and running a function that loads every string for that strategy.

I don't have more data for the string section approach, as I haven't had time to implement a string.const instruction as well. I would guess it would be roughly the same cost as a single global.get as it would be implemented almost identically. So somewhere between the import_globals and data_segments seems like a likely cost.

eqrion · 2024-03-13T20:09:12Z

From looking at some profiles, I do believe we could speed up the imported globals approach. Most of the extra time there is spent in parsing the imports in the binary, then doing slow generic JS property lookups on the imports object. I think we could add a special case for using numeric indices on a dense JS array that bypasses a lot of that generic code.

However, that approach is still limited to 100_000 imports as said above.

I'm warming up to the idea of a structFromIterable operation because that seems like it could be faster than the imported array approach, and also be easier to use in constant expressions (array.get has bounds checks that can trap).

eqrion · 2024-03-13T20:16:32Z

@kripken For the J2CL use-case, how many strings are you seeing right now, and what would be a reasonable maximum limit? Would supporting up to 100k strings in a struct work for you? If you needed more, would splitting into several structs be a large burden.

@bashor Would you have a concern about having a 100k string constant limit?

kripken · 2024-03-14T20:42:45Z

@eqrion I see 73K strings in a large Java testcase. But I can't say if that is a reasonable limit, other codebases may well need more...

import_globals: 60.2ms -> 77.9ms
data_segments: 225ms -> 223.4ms (within noise during 200 iterations)
import_array: 13.7ms -> 33.9ms

The first and last lines seem to say that the imported array method has less startup overhead but a little more runtime overhead. The middle one though confuses me: how can data_segments have ~0 (within noise) runtime overhead for string accesses? Am I misreading that line?

eqrion · 2024-03-15T17:45:02Z

The first and last lines seem to say that the imported array method has less startup overhead but a little more runtime overhead. The middle one though confuses me: how can data_segments have ~0 (within noise) runtime overhead for string accesses? Am I misreading that line?

I was confused by that as well. I spent a good amount of time verifying that I didn't mess it that one up originally and couldn't find anything. I'll try to swing back and look at that again.

bashor · 2024-03-18T23:37:35Z

@bashor Would you have a concern about having a 100k string constant limit?

We'll check it on some projects and come back to you.

bashor · 2024-03-20T12:39:25Z

@eqrion so far, the biggest number of string literals we've seen in real applications is ~14K.

bashor · 2024-03-20T12:40:59Z

It's hard to predict if 100K is enough, and for how long. I think the current number could be easily doubled within a year.

bashor · 2024-03-20T12:42:20Z

I'm wondering how strict these limits are, and if they could be increased in the future.

eqrion · 2024-03-22T19:26:16Z

I'm wondering how strict these limits are, and if they could be increased in the future.

It depends on which approach is taken:

If we expect strings are imported as globals, there's a shared limit of 100,000 imports that all engines use
If we expect strings are stored in a struct that's imported as a global (WebAssembly.structFromIterable idea from above), there's a shared limit of 10,000 fields for structs

With (2) I think it would be reasonable for implementations to increase their max struct size to 100,000 fields to support this use case. Going above that may get tricky, but 200,000 may be possible. I don't think engines want to dramatically increase the limit on number of imports.

Even if we had a limit of 100,000 fields for structs, it would still be possible for toolchains to split their string constants into multiple structs, and import multiple of them. It just is less convenient.

kripken · 2024-03-22T19:30:45Z

Yes, I think toolchains can fairly easily emit multiple structs here to stay under the 10,000 field limit. (This does require a whole-program decision, but even emitting a single struct requires that anyhow, so I don't see it as adding much work.)

gkdn · 2024-03-31T00:12:44Z

For Google Sheets, when we switched to string-builtins spec, our production binaries got 442KB larger for around total of 14k string constants - that's extra ~5%.
I'm estimating around 80-90KB of the regression is due to less efficient encoding of the constants as JSON. Every single constant also adds around an extra 25 bytes on average to declare the imports in the wasm module.

re. offload string constants to be part of JavaScript load:
IMO it should be part of the design goal here to make Wasm binaries self contained. It is not good to add an extra large chunk of code to be downloaded/processed in initial JS load; it is also not good to try to late load those JS constants and add an extra roundtrip around Wasm initialization.

re. number of constants:
For Google Sheets, I have seen tests with 112K string constants.

tlively · 2024-04-01T22:29:42Z

@gkdn, can you share how the compressed size changes as well?

gkdn · 2024-04-01T22:55:09Z

When compressed, regression is 38kb / ~3.5%

osa1 · 2024-04-05T09:08:54Z

For dart2wasm's use case of js-strings, @jakobkummerow's suggestion sounds really good to me.

As an example of how much the .js shim grows with strings, currently in this app with a lot of strings, the generated JS shim (with string literals) is 546,872 bytes, of which 518,837 bytes are for the strings (94%).

@jakobkummerow's suggestion solves this problem, and I don't see any downsides for dart2wasm as long as number of imports and the name size limit is large enough.

gkdn · 2024-04-05T20:34:38Z

@osa1 Does Dart disallow invalid UTF-16 (like unmatched surrogate pairs)?

sjrd · 2024-04-08T08:19:21Z

The Scala compiler allows unpaired surrogate chars in string literals. However, from my experience, actual occurrences of such strings are exceedingly rare in practice (some tools surrounding the compiler, such as the code formatter, even choke on them). As long as there is some workaround available, even if not very performant, I believe it would be fine to restrict string constants to valid Unicode scalar value sequences.

osa1 · 2024-04-08T08:23:04Z

(I deleted my original comment as I was wrong)

Dart also allows unpaired surrogate pairs in string literals. Example:

void main() {
  final s1 = "\u{D83D}";
  final s2 = String.fromCharCode(0xDE00);
  print(s1 + s2); // prints 😄
}

I also think these cases should be extremely rare, and we have the workaround of importing such strings in the .mjs file.

jakobkummerow · 2024-04-08T17:45:43Z

Implementation feedback:

I. Design question: compile or instantiate?
There is some design wiggle room around the time when the constants are dealt with. Specifically, this could conceptually be part of WebAssembly.compile (like the functions in this proposal), or part of WebAssembly.instantiate.

Reasons in favor of WebAssembly.compile:

Consistency with the imported string functions.
Maybe some engine actually wants to embed string constants into generated code (and wants to create such code eagerly as part of the compile step).
Modules that get instantiated several times per compilation only need to do the validation once (imports in the " module must be globals of type immutable externref). Not sure how many such modules there are.

Reasons in favor of WebAssembly.instantiate:

Globals and functions are fundamentally different, so any attempt to treat them "consistently" might cause more confusion/contortion than simplification (which is typically the goal of consistent handling). (In a similar vein, @eqrion suggested above to have an opt-in flag that's not part of the builtins list but something separate -- it could also be consumed by a different phase.)
Engines likely need to delay allocation of string constants until instantiation anyway, because strings are heap objects and a heap on which to allocate is typically not available at compile time.
Contrary to string builtin functions, which are expected/required to get special compiler treatment for performance, the obvious implementation for string constant globals is no special handling in the compiler, so there's no need to have them available at compile time.

Personally I have no strong opinion on this, the question just occurred to me when I started implementing these constants. Both options are implementable.

II. Choice of the magic module name
Having a single-character name for the recognized imported module isn't just binary-size efficient, it is also convenient for implementations, because it's fast and simple to check (something like module_name.length() == 1 && module_name.chars()[0] == kMagicChar). If we had a compact import section format, this might be alleviated by engines switching to two-level internal storage as well, which would eliminate the need to check every single import's module name. I'm not sure how likely either of these things are (a new binary format might not find CG agreement, and implementations might find a flat list of imports more convenient).
We could pick ' instead of " to avoid the unsightly backslash in the text format. Personally, I don't think (import "string-constant" "Hello, world!") is tangibly better than (import "'" "Hello, world!"), so I wouldn't want to pay any non-negligible price for the more verbose form.

III. Implementation effort
Even easier than I realized previously, because import names are already converted to JS strings anyway, in order to perform the property lookups on the imports object, so replacing those lookups is trivially easy. Plumbing the flag from compile (see I.) to instantiate adds some boilerplate, but overall my patch is still at just ~70 lines.
There's some room for performance improvements now that we expect to have tens of thousands of imports, which leads to more effort, but that would apply to any of the approaches that define one global per string constant.

eqrion · 2024-04-09T14:50:45Z

Implementation feedback:

I. Design question: compile or instantiate? There is some design wiggle room around the time when the constants are dealt with. Specifically, this could conceptually be part of WebAssembly.compile (like the functions in this proposal), or part of WebAssembly.instantiate.

I think I have a slight preference for the flag being part of the compile phase. The reason would be that it allows us to know during eager compilation that while the global is 'externref' it will only ever be a non-null string and we could use that to eliminate some casts when we see the global used.

But also, it's not a strong preference. I do have a strong preference for this being a separate kind of flag though.

II. Choice of the magic module name Having a single-character name for the recognized imported module isn't just binary-size efficient, it is also convenient for implementations, because it's fast and simple to check (something like module_name.length() == 1 && module_name.chars()[0] == kMagicChar). If we had a compact import section format, this might be alleviated by engines switching to two-level internal storage as well, which would eliminate the need to check every single import's module name. I'm not sure how likely either of these things are (a new binary format might not find CG agreement, and implementations might find a flat list of imports more convenient). We could pick ' instead of " to avoid the unsightly backslash in the text format. Personally, I don't think (import "string-constant" "Hello, world!") is tangibly better than (import "'" "Hello, world!"), so I wouldn't want to pay any non-negligible price for the more verbose form.

I think that if we can't get any agreement on a compact import section format that using a single character might be okay, but I think I'd want to measure it first. I think a compact import section could let us optimize string imports very well by switching to a tight loop that just allocates strings and puts them into their final storage.

jakobkummerow · 2024-04-10T15:57:32Z

I'm landing my patch in V8, to make it easier for our toolchain partners to experiment with this approach. Of course it's still part of the off-by-default --experimental-wasm-imported-strings feature.

For now, it uses {builtins: [...], importedStringConstants: true} as the opt-in flag, and requires the "module name" for string constants to be ' (i.e. a single-quote character). It's easy to change these details later if necessary.

eqrion · 2024-04-10T15:59:26Z

I'm landing my patch in V8, to make it easier for our toolchain partners to experiment with this approach. Of course it's still part of the off-by-default --experimental-wasm-imported-strings feature.

For now, it uses {builtins: [...], importedStringConstants: true} as the opt-in flag, and requires the "module name" for string constants to be ' (i.e. a single-quote character). It's easy to change these details later if necessary.

Cool, I hope to have a quick PR up for this soon too. I'll just use the same module name as well for now.

Per the discussion at: WebAssembly/js-string-builtins#13 this patch implements imported externref globals that can be requested to be automatically populated with string constants, with the string's value equaling the respective import's name. Bug: v8:14179 Change-Id: Ie86dc28b5a337c64ef03cb879e45adf340001a00 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/5438375 Commit-Queue: Jakob Kummerow <[email protected]> Reviewed-by: Matthias Liedtke <[email protected]> Cr-Commit-Position: refs/heads/main@{#93301}

eqrion · 2024-04-10T18:35:13Z

I've now adopted the imported string constants approach described above in the overview. I will leave this issue open for a while in case this approach needs refinement or we want to look at another approach.

gkdn · 2024-05-16T20:16:00Z

Collected some data from Google Sheets:

Custom section:  4326288 (compressed 1137269)
Magic-import:    4048402 (compressed 1113046)
string.const:    3895540 (compressed 1103698)

The regression over string.const is likely coming from:

~65K due to constant definitions (13K constants, 5 byte per definition)
~85K due to impact on global indices (13K imports taking over smallest indices as imported globals needs to be first)

eqrion · 2024-06-05T19:02:12Z

I'm going to close this as I believe we have an acceptable solution now.

conrad-watt · 2024-06-13T12:15:40Z

I'd like to reopen the discussion a little bit - remember in the in-person meeting, I asked whether a modified scheme like the following would be possible:

instead of directly encoding each string as an import name, interpret each import name as an index into a custom section which holds a table of string bytes
the compile-time flag remains the same - if a module is compiled with this flag set, the "global imports" turn into globals containing externrefs representing the relevant strings encoded in the custom section

Now I realise that there's a specific advantage of this alternative scheme: the custom section can use UTF16/WTF16 encoding. In the current scheme where strings come from the import names directly, a reencoding step seems needed to go from UTF8 -> WTF16, which would need a copy. Also, since UTF8 can't represent all WTF16 strings, producers will need a fallback. If the custom section in the alternative scheme directly encodes UTF16/WTF16 strings, there's no reencoding step or producer gap (with WTF16), and engines could even in principle directly use the allocated space of the custom section as the backing store for the relevant string.

I realise that custom section based designs have been discussed above, but they seem to involve host code parsing the custom section and feeding it back into the module's imports, rather than combining with the "compile time flag" idea that's been newly introduced.

jakobkummerow · 2024-06-13T14:09:55Z

@conrad-watt nice idea, two comments:

(1) Wire bytes size will be larger. The current approach needs (on top of the import's "module name", whatever that'll end up being) the length of the string and the string payload itself (as utf-8). Your idea needs the length of the index string, the index string, the length of the actual string (assuming a vector-of-vectors encoding for the custom section like existing similar sections), and the string payload itself; so depending on how exactly the indices are encoded, that'll probably add around 3-6 bytes per string constant.

(2) I once tried to implement string constants in wire bytes to be used as backing stores for the strings created from them, but had to give up on the attempt because sections aren't two-byte aligned, so for the time being I consider this an entirely hypothetical benefit that won't actually materialize in practice. (In many common languages, utf-8 is also much shorter than wtf-16, so module producers may well opt against wtf-16 for that reason if given a choice.)

fitzgen · 2024-06-13T15:46:52Z

Your idea needs the length of the index string, the index string, the length of the actual string (assuming a vector-of-vectors encoding for the custom section like existing similar sections), and the string payload itself; so depending on how exactly the indices are encoded, that'll probably add around 3-6 bytes per string constant.

IIUC, I think we could cut this down to just

index into the custom section
length of this string

for each string constant if the custom section is a concatenation of all the strings, rather than a vector of strings.

This is still one more LEB128 than the current encoding, and therefore would have that additional code size overhead, but maybe that is worth it to avoid the re-encoding issues and the WTF16 compatibility?

conrad-watt · 2024-06-13T17:07:12Z

IIUC, I think we could cut this down to just

index into the custom section

length of this string

for each string constant if the custom section is a concatenation of all the strings, rather than a vector of strings.

Would this allow string global constants to reuse each others' bytes by overlapping their declared ranges in the custom section (e.g. substrings)? I don't know how much this would come up in practice, but it seems an interesting compression "trick".

fitzgen · 2024-06-13T17:10:52Z

Would this allow one string global constant to be defined as a substring of another by overlapping their declared ranges in the custom section? I don't know how much this would come up in practice, but it seems an interesting compression "trick".

Yes, good point. If you had both of the strings wu-tang and tangible, you'd be able to use the custom section wu-tangible and slice into it as appropriate to define each string constant.

sjrd · 2024-06-13T20:47:15Z

On our main test suite, encoding strings as WTF-8 instead of WTF-16 (in the data segment) reduces the overall code size (not just size of the strings) by almost 6% (400 KB out of 6800 KB). That's a test suite that contains a non-trivial amount of non-ASCII code points in constant strings too, since it contains a number of test cases for string algorithms. I'm fairly confident the amount of non-ASCII code points in that codebase is above the average of production builds.

I don't think we want an encoding that mandates U/WTF-16. That would be a very significant code size regression by design.

ajklein · 2024-06-13T22:58:35Z

Following on @sjrd's point, I believe it's uncommon (and not recommended) for JavaScript source to be served over the web in UTF-16 format; instead, UTF-8 is preferred. See, for example, Mozilla engineer & character encoding expert Henri Sivonen's post Always Use UTF-8[...], or this comment in Chromium's code for streaming JS parsing (which doesn't even seem to support UTF-16!).

While this doesn't imply that Wasm must do the same, it shows that UTF-8 is a very common way to send JS strings over the wire, and it's reasonable to expect JS engines to be adept at dealing with them in that encoding.

fitzgen · 2024-06-13T23:03:37Z

FWIW, the concatenated/overlapping trick could be used with utf-8 too, although you'd have to validate that indices fell on valid USVs.

jakobkummerow · 2024-06-14T12:01:22Z

IIUC, I think we could cut this down to just

index into the custom section

length of this string

How would you encode that in an import, and how many bytes would that take?

fitzgen · 2024-06-14T15:59:02Z

How would you encode that in an import, and how many bytes would that take?

If not extending core Wasm import encoding, you'd need an encoding scheme that is valid utf-8. This is going to be less dense and/or have a more restricted range than LEB128s.

Just brainstorming, haven't thought deeply about this, but: I imagine you could use ' as the import module (same as the current scheme) and then have the import name be two scalar values, one for index and one for length, where

if the USV is in the range 0x0..0xd7ff then the decoded integer value is the USV itself
if the USV is in the range 0xe000..0x10ffff then the decoded integer value is USV - (0xe000 - 0xd7ff)
(the above covers all cases, since that covers all valid USVs and we know that import names are always valid utf-8)

This is basically using the USV as the integer value directly when possible, but because USVs are not contiguous, we subtract the size of the non-contiguous hole when we have a USV larger than the hole.

This has the advantage of a pretty dense encoding, at most 4 bytes per integer, that is also pretty easy and fast to encode and decode. The downside is that it cannot represent values larger than 1_112_062¹. But that limitation seems like it might be okay? How often do we expect to have string literals or a custom string constant section longer than that? Especially given that the producer toolchain can do some kind of prefix- and suffix-sharing compression scheme on the custom section?

Regardless of the exact encoding scheme for an integer chosen, we could further compress them by using a delta encoding, where we transform this series of (index, length) pairs

(0, 10)
(7, 5)
(10, 3)

into

(0, 10)
(7, -5)
(3, -2)

This delta encoding would perhaps alleviate some of the constraints imposed by the max-encodable-integer limitation, but more importantly would reduce most encoded integers into a single byte since it would be rare that consecutive integers would have a larger delta than what can be encoded in a single byte, especially if the producer toolchain sorts them.

String constants #13

String constants #13

Comments

eqrion commented Dec 6, 2023

jakobkummerow commented Dec 6, 2023

eqrion commented Jan 19, 2024

jakobkummerow commented Jan 22, 2024

eqrion commented Jan 23, 2024

jakobkummerow commented Jan 23, 2024

eqrion commented Jan 31, 2024

jakobkummerow commented Feb 1, 2024

eqrion commented Feb 1, 2024

eqrion commented Feb 29, 2024

kripken commented Feb 29, 2024

eqrion commented Mar 1, 2024

kripken commented Mar 1, 2024

eqrion commented Mar 6, 2024

eqrion commented Mar 6, 2024

eqrion commented Mar 13, 2024

eqrion commented Mar 13, 2024

eqrion commented Mar 13, 2024

kripken commented Mar 14, 2024

eqrion commented Mar 15, 2024

bashor commented Mar 18, 2024

bashor commented Mar 20, 2024

bashor commented Mar 20, 2024

bashor commented Mar 20, 2024

eqrion commented Mar 22, 2024

kripken commented Mar 22, 2024 • edited Loading

gkdn commented Mar 31, 2024

tlively commented Apr 1, 2024

gkdn commented Apr 1, 2024

osa1 commented Apr 5, 2024

gkdn commented Apr 5, 2024

sjrd commented Apr 8, 2024 • edited Loading

osa1 commented Apr 8, 2024

jakobkummerow commented Apr 8, 2024

eqrion commented Apr 9, 2024

jakobkummerow commented Apr 10, 2024

eqrion commented Apr 10, 2024

eqrion commented Apr 10, 2024

gkdn commented May 16, 2024

eqrion commented Jun 5, 2024

conrad-watt commented Jun 13, 2024 • edited Loading

jakobkummerow commented Jun 13, 2024

fitzgen commented Jun 13, 2024

conrad-watt commented Jun 13, 2024 • edited Loading

fitzgen commented Jun 13, 2024

sjrd commented Jun 13, 2024

ajklein commented Jun 13, 2024

fitzgen commented Jun 13, 2024 • edited Loading

jakobkummerow commented Jun 14, 2024

fitzgen commented Jun 14, 2024

Footnotes

eqrion commented Jun 14, 2024

conrad-watt commented Jun 14, 2024

vouillon commented Jun 15, 2024

eqrion commented Jun 17, 2024

eqrion commented Jul 11, 2024

kripken commented Mar 22, 2024 •

edited

Loading

sjrd commented Apr 8, 2024 •

edited

Loading

conrad-watt commented Jun 13, 2024 •

edited

Loading

conrad-watt commented Jun 13, 2024 •

edited

Loading

fitzgen commented Jun 13, 2024 •

edited

Loading