This proposal proposes the ability of writing integers and floating point values in the data segments of WebAssembly Text Format (WAT). This document is the summarization of this issue in WebAssembly design discussion repo.
Try live demo : https://wasmprop-numerical-data.netlify.app/wat2wasm
Updates:
2020-06-09: This proposal has been presented in 2020-06-9 WebAssembly CG Meeting.
2020-06-12: Added data alignment, out of range values, and wasm2wat translation subsection. These items were asked during the meeting.
2020-06-23: Advanced to phase 1 in 2020-06-23 WebAssembly CG Meeting
2020-06-26: The official spec fork repo is made.
-
Currently, the data values in data segments in WAT can only be written in strings. For example:
(data (offset (i32.const 0)) "1234") ;; 3132 3334
(data (offset (i32.const 0)) "\09\ab\cd\ef") ;; 09ab cdef
-
This proposal proposes another writing format that allows us to write integers and float values.
(data (offset (i32.const 0)) (f32 0.2 0.3 0.4) ;; cdcc 4c3e 9a99 993e cdcc cc3e )
(memory $1 (data (i8 1 2) ;; 0102 (i16 3 4) ;; 0300 0400 ) )
-
Live Prototype: https://wasmprop-numerical-data.netlify.app/wat2wasm/
-
Writing arbritary numeric values (integers and floats) to data segments is not simple. We need to encode the data, add escape characters
\
, and write it as strings.For example, the following snippet is meant to write some float values to a data segment.
(data (i32.const 0) "\00\00\c8\42" "\00\00\80\3f" "\00\00\00\3f" )
-
If we ever need to review the numbers above, we cannot easily see the values without decoding it.
"\00\00\c8\42"
=>0x42c80000
=>100.0
This proposal suggests a slight modification in the text format specification to accommodate writing numeric values in data segments.
The data value in data segments should accept both strings and list of numbers (numvec).
dataI ::= ‘(’ ‘data’ x:memidxI ‘(’ ‘offset’ e:exprI ‘)’ b*:dataval ‘)’ => { data x', offset e, init b* } dataval ::= (b*:datavalelem)* => concat((b*)*) datavalelem ::= b*:string => b* | b*:numvec => b*
Numvecs denote sequences of bytes. They are enclosed in parentheses, start with a keyword to identify the type of the numbers, and followed by a list of numbers.
The numbers inside numvecs represent their byte sequence using the respective encoding. They are encoded using two's complement encoding for integers and IEEE754 encoding for float values. Each numvec symbol represents the concatenation of the bytes of those numbers.
numvec ::= ‘(’ ‘i8’ (n:i8)* ‘)’ => concat((bytesi8(n))*) (if |concat((bytesi8(n))*) | < 232) | ‘(’ ‘i16’ (n:i16)* ‘)’ => concat((bytesi16(n))*) (if |concat((bytesi16(n))*)| < 232) | ‘(’ ‘i32’ (n:i32)* ‘)’ => concat((bytesi32(n))*) (if |concat((bytesi32(n))*)| < 232) | ‘(’ ‘i64’ (n:i64)* ‘)’ => concat((bytesi64(n))*) (if |concat((bytesi64(n))*)| < 232) | ‘(’ ‘f32’ (n:f32)* ‘)’ => concat((bytesf32(n))*) (if |concat((bytesf32(n))*)| < 232) | ‘(’ ‘f64’ (n:f64)* ‘)’ => concat((bytesf64(n))*) (if |concat((bytesf64(n))*)| < 232)
This new data value form should also be available in the inline data segment in the memory module.
‘(’ ‘memory’ id? ‘(’ ‘data’ bn:dataval ‘)’ ‘)’ ≡ ‘(’ ‘memory’ id' m m ‘)’ ‘(’ ‘data’ id' ‘(’ ‘i32.const’ ‘0’ dataval ‘)’ (if id'=id? ≠ 𝜖 ∨ id' fresh, m=ceil(n/64Ki))
;; XYZ coordinate points
(data (offset (i32.const 0))
(f32 0.2 0.3 0.4)
(f32 0.4 0.5 0.6)
(f32 0.4 0.5 0.6)
)
;; Writing 1001st ~ 1010th prime number
(data (offset (i32.const 0x100))
(i16 7927 7933 7937 7949 7951 7963 7993 8009 8011 8017)
)
;; PI
(data (offset (i32.const 0x200))
(f64 3.14159265358979323846264338327950288)
)
;; Inline in memory module
(memory (data (i8 1 2 3 4)))
The conversion of numvecs to data in data segments happens during the text format to binary format compilation.
So, the following two snippents:
...
(memory 1)
(data (offset (i32.const 0))
"abcd"
(i16 -1)
(f32 62.5)
)
...
...
(memory 1)
(data (offset (i32.const 0))
"abcd"
"\FF\FF"
"\00\00\7a\42"
)
...
will output the same binary code:
...
; data segment header 0
0000010: 00 ; segment flags
0000011: 41 ; i32.const
0000012: 00 ; i32 literal
0000013: 0b ; end
0000014: 0a ; data segment size
; data segment data 0
0000015: 6162 6364 ffff 0000 7a42 ; data segment data
000000e: 10 ; FIXUP section size
...
As previously described, the encoding of numbers inside numvecs is two's complement for integers and IEEE754 for float, which is similar to the t.store
memory instructions. This encoding is used to ensure that when we load the value from memory using the load
memory instructions, the value will be consistent whether the data was stored by using (data ... )
initialization or t.store
instructions.
Unaligned placements are allowed. For example:
...
(memory 1)
(data (offset (i32.const 0))
(i8 1) ;; will go to address 0
(i16 2) ;; will go to address 1
)
...
compiles to: 0102 00
Out of range values should throw error during text format to binary format compilation.
(memory 1)
(data (offset (i32.const 0))
(i8 256) ;; Error
(i8 -129) ;; Error
)
The data segments in the compiled binary do not contain any information about their original form in WAT state. Therefore, the translation from the binary format back to the text format will use the default string form.
As the proposed grammar still accepts the string form, all existing WAT codes should work fine.
- Ezzat Chamudi - echamudi
This proposal also incorporates suggestions from Ben Smith (binji), Jacob Gravelle (jgravelle-google), and Andreas Rossberg (rossberg).