Skip to content

Latest commit

 

History

History
221 lines (162 loc) · 12.2 KB

File metadata and controls

221 lines (162 loc) · 12.2 KB

WebAssembly Proposal: Numeric Values in WAT Data Segments

This proposal proposes the ability of writing integers and floating point values in the data segments of WebAssembly Text Format (WAT). This document is the summarization of this issue in WebAssembly design discussion repo.

Try live demo : https://wasmprop-numerical-data.netlify.app/wat2wasm

Updates:

2020-06-09: This proposal has been presented in 2020-06-9 WebAssembly CG Meeting.

2020-06-12: Added data alignment, out of range values, and wasm2wat translation subsection. These items were asked during the meeting.

2020-06-23: Advanced to phase 1 in 2020-06-23 WebAssembly CG Meeting

2020-06-26: The official spec fork repo is made.

Summary

  • Currently, the data values in data segments in WAT can only be written in strings. For example:

    (data (offset (i32.const 0)) "1234")               ;; 3132 3334
    (data (offset (i32.const 0)) "\09\ab\cd\ef")       ;; 09ab cdef
  • This proposal proposes another writing format that allows us to write integers and float values.

    (data (offset (i32.const 0))
        (f32 0.2 0.3 0.4)          ;; cdcc 4c3e 9a99 993e cdcc cc3e
    )
    (memory $1
        (data 
            (i8 1 2)               ;; 0102
            (i16 3 4)              ;; 0300 0400
        )
    )
  • Live Prototype: https://wasmprop-numerical-data.netlify.app/wat2wasm/

    (prototype repo)

Motivation

  • Writing arbritary numeric values (integers and floats) to data segments is not simple. We need to encode the data, add escape characters \, and write it as strings.

    For example, the following snippet is meant to write some float values to a data segment.

    (data (i32.const 0)
        "\00\00\c8\42"
        "\00\00\80\3f"
        "\00\00\00\3f"
    )
  • If we ever need to review the numbers above, we cannot easily see the values without decoding it.

    "\00\00\c8\42" => 0x42c80000 => 100.0

Overview

This proposal suggests a slight modification in the text format specification to accommodate writing numeric values in data segments.

Text Format Spec Changes

The data value in data segments should accept both strings and list of numbers (numvec).

dataI ::= ‘(’ ‘data’ x:memidxI ‘(’ ‘offset’ e:exprI ‘)’ b*:dataval ‘)’
                => { data x', offset e, init b* }

dataval ::= (b*:datavalelem)*       => concat((b*)*)

datavalelem ::= b*:string           => b*
             |  b*:numvec           => b*

Numvecs denote sequences of bytes. They are enclosed in parentheses, start with a keyword to identify the type of the numbers, and followed by a list of numbers.

The numbers inside numvecs represent their byte sequence using the respective encoding. They are encoded using two's complement encoding for integers and IEEE754 encoding for float values. Each numvec symbol represents the concatenation of the bytes of those numbers.

numvec ::= ‘(’ ‘i8’  (n:i8)*  ‘)’      => concat((bytesi8(n))*)    (if |concat((bytesi8(n))*) | < 232)
        |  ‘(’ ‘i16’ (n:i16)* ‘)’      => concat((bytesi16(n))*)   (if |concat((bytesi16(n))*)| < 232)
        |  ‘(’ ‘i32’ (n:i32)* ‘)’      => concat((bytesi32(n))*)   (if |concat((bytesi32(n))*)| < 232)
        |  ‘(’ ‘i64’ (n:i64)* ‘)’      => concat((bytesi64(n))*)   (if |concat((bytesi64(n))*)| < 232)
        |  ‘(’ ‘f32’ (n:f32)* ‘)’      => concat((bytesf32(n))*)   (if |concat((bytesf32(n))*)| < 232)
        |  ‘(’ ‘f64’ (n:f64)* ‘)’      => concat((bytesf64(n))*)   (if |concat((bytesf64(n))*)| < 232)

This new data value form should also be available in the inline data segment in the memory module.

‘(’ ‘memory’ id? ‘(’ ‘data’ bn:dataval ‘)’ ‘)’ ≡
    ‘(’ ‘memory’ id' m m ‘)’ ‘(’ ‘data’ id' ‘(’ ‘i32.const’ ‘0’ dataval ‘)’
        (if id'=id? ≠ 𝜖 ∨ id' fresh, m=ceil(n/64Ki))

Usage Example

;; XYZ coordinate points 
(data (offset (i32.const 0))
    (f32 0.2 0.3 0.4)
    (f32 0.4 0.5 0.6)
    (f32 0.4 0.5 0.6)
)

;; Writing 1001st ~ 1010th prime number
(data (offset (i32.const 0x100))
    (i16 7927 7933 7937 7949 7951 7963 7993 8009 8011 8017)
)

;; PI
(data (offset (i32.const 0x200))
    (f64 3.14159265358979323846264338327950288)
)

;; Inline in memory module
(memory (data (i8 1 2 3 4)))

Execution Result Example

The conversion of numvecs to data in data segments happens during the text format to binary format compilation.

So, the following two snippents:

...
(memory 1)
(data (offset (i32.const 0))
  "abcd"
  (i16 -1)
  (f32 62.5)
)
...
...
(memory 1)
(data (offset (i32.const 0))
  "abcd"
  "\FF\FF"
  "\00\00\7a\42"
)
...

will output the same binary code:

...
; data segment header 0
0000010: 00                                        ; segment flags
0000011: 41                                        ; i32.const
0000012: 00                                        ; i32 literal
0000013: 0b                                        ; end
0000014: 0a                                        ; data segment size
; data segment data 0
0000015: 6162 6364 ffff 0000 7a42                  ; data segment data
000000e: 10                                        ; FIXUP section size
...

Additional Information

Encoding

As previously described, the encoding of numbers inside numvecs is two's complement for integers and IEEE754 for float, which is similar to the t.store memory instructions. This encoding is used to ensure that when we load the value from memory using the load memory instructions, the value will be consistent whether the data was stored by using (data ... ) initialization or t.store instructions.

Data Alignment

Unaligned placements are allowed. For example:

...
(memory 1)
(data (offset (i32.const 0))
  (i8 1)    ;; will go to address 0
  (i16 2)   ;; will go to address 1
)
...

compiles to: 0102 00

Out of Range Values

Out of range values should throw error during text format to binary format compilation.

(memory 1)
(data (offset (i32.const 0))
  (i8 256)        ;; Error
  (i8 -129)       ;; Error
)

Binary Format to Text Format Translation

The data segments in the compiled binary do not contain any information about their original form in WAT state. Therefore, the translation from the binary format back to the text format will use the default string form.

Backward Compatibility

As the proposed grammar still accepts the string form, all existing WAT codes should work fine.

Author

This proposal also incorporates suggestions from Ben Smith (binji), Jacob Gravelle (jgravelle-google), and Andreas Rossberg (rossberg).