Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Format] Allow the data in data segments to be written in a vector of u8 and s8 numbers. #1348

Open
echamudi opened this issue May 26, 2020 · 15 comments

Comments

@echamudi
Copy link

Hello,

In the current specification, when we initialize data segments, we are only allowed to use a vector of strings.

Screen Shot 2020-05-26 at 8 00 30 PM

For example:

(data (offset (i32.const 0)) "a" "b" "c")

or

(data (offset (i32.const 0)) "abc")

To save some arbitrary numbers, we can use backslashes and character hex numbers.

(data (offset (i32.const 0)) "\01\02\03\04\05" )

But this approach feels a bit hacky. Wouldn't it better if we are also allowed to put a vector of numbers that can be written in numbers directly?

For example:

;; in decimal
(data (offset (i32.const 0)) 1 2 3 4 5 254 255)
;; in hex
(data (offset (i32.const 0)) 0x01 0x02 0x03 0x04 0x05 0xFE 0xFF)

And I think this addition still conforms with the core syntax definition, which defines the data part as a vector of bytes:

Screen Shot 2020-05-26 at 8 01 06 PM

@binji
Copy link
Member

binji commented May 26, 2020

I agree that the current format is pretty limited. That said, I think if we change this we'll want to have something more comprehensive. For example, see this simple raytracer I made. These values are f32 constants, but it's really hard to tell.

We'll probably want syntax for this (maybe (f32.const x)? Though this raises questions how to handle u8/u16, since there's no u16.const instruction...)

@echamudi
Copy link
Author

echamudi commented May 26, 2020

Interesting demo :)

If we want something more advanced than an array of bytes, maybe we need to introduce special keywords for them? For example:

data ::= '(' 'data' memidx '(' 'offset' expr ')' datacontent* ')'

datacontent ::= string
             | '(' 'data_u8' u8 ')'
             | '(' 'data_u16' u16 ')'
             | '(' 'data_u32' u32 ')'
             ...
             | '(' 'data_f32' f32 ')'
             | '(' 'data_f64' f64 ')'

Here's the usage example:

(data (i32.const 0)
  "hello"
  (data_f32 3.14159265359)
  (data_u32 0xABCD)
)

Because they are converted to data in the segment data during the wat2wasm compilation, we don't need to add any additional valtype.const instructions (e.g. u16.const, etc).

And the additional grammar above should be backward compatible, as it still accepts strings.

@binji
Copy link
Member

binji commented May 26, 2020

Because they are converted to data in the segment data during the wat2wasm compilation, we don't need to add any additional valtype.const instructions (e.g. u16.const, etc).

Good point. In that case it could use the same format anyway (which is also used in other places too, e.g. assertions), as if we had an u8.const instruction.

This is a pretty simple change, but still requires going through the proposal process for standardization. Would you want to champion it?

@echamudi
Copy link
Author

Sure, I would be glad to do that. What are the steps that I should take?

@binji
Copy link
Member

binji commented May 27, 2020

Next steps would be to bring this to a future wasm community group meeting to see whether the group is interested in pursuing it. Are you a community group member? If not, you can join here: https://www.w3.org/community/webassembly/

@jgravelle-google
Copy link

Suggestion:

datacontent ::= string
             | '(' 'data_u8' u8* ')'
             | '(' 'data_u16' u16* ')'
             | '(' 'data_u32' u32* ')'
             ...
             | '(' 'data_f32' f32* ')'
             | '(' 'data_f64' f64* ')'

So for example if we wanted to have a vector of 3 floats, instead of having to say (data_f32 1.0) (data_f32 2.0) (data_f32 3.0), we could say (data_f32 1.0 2.0 3.0)

Could take that idea even farther and allow for a mixable data stream. For example, variable width vectors that are prefixed by their length: (data_stream i32 2 f32 1.0 2.0 i32 3 f32 3.0 4.0 5.0) Not sure how useful that would actually be, just spitballing at this point.

@echamudi
Copy link
Author

@jgravelle-google's suggestion seems to be useful. For example, this copied code from @binji's ray tracer:

(data (i32.const 0)
  "\00\00\00\00" ;; ground.x
  "\00\00\c8\42" ;; ground.y
  "\00\00\00\00" ;; ground.z
  "\00\00\c2\42" ;; ground.r
  "\3b\df\6f\3f" ;; ground.R
  "\04\56\8e\3e" ;; ground.G
  "\91\ed\bc\3e" ;; ground.B
)

can be re-written as

(data (offset (i32.const 0))
  (data_f32 0 100.0 0 97.0)     ;; ground.{x,y,z,r}
  (data_f32 
    0.936999976635              ;; ground.R
    0.277999997139              ;; ground.G
    0.368999987841              ;; ground.B
  )
)

Regarding @binji's previous comment:

Good point. In that case it could use the same format anyway (which is also used in other places too, e.g. assertions), as if we had an u8.const instruction.

Yeah, maybe it's better to use those formats instead of introducing the new ones. So I'm thinking the possible updated grammar would be this:

data ::= '(' 'data' memidx '(' 'offset' e:expr ')' datacontent* ')'

datacontent ::= string
            | '(' 'u8.const'  u8+ ')'
            | '(' 'u16.const' u16+ ')'
            | '(' 'u32.const' u32+ ')'
            | '(' 'u64.const' u64+ ')'

            | '(' 's8.const'  s8+ ')'
            | '(' 's16.const' s16+ ')'
            | '(' 's32.const' s32+ ')'
            | '(' 's64.const' s64+ ')'

            | '(' 'f32.const' f32+ ')'
            | '(' 'f64.const' f64+ ')'

@binji
Copy link
Member

binji commented May 27, 2020

Nice, I like this! Although we can simplify it further, since we already require that i32.const etc. handle both signed and unsigned values. So we could have just:

datacontent ::= string
            | '(' 'i8.const'  i8+ ')'
            | '(' 'i16.const' i16+ ')'
            | '(' 'i32.const' i32+ ')'
            | '(' 'i64.const' i64+ ')'

            | '(' 'f32.const' f32+ ')'
            | '(' 'f64.const' f64+ ')'

@echamudi
Copy link
Author

@binji Great, I hope it can make its way to the text format spec. I've just joined the wasm community group with id: echamudi.

@binji
Copy link
Member

binji commented May 27, 2020

@echamudi Thanks! If you can attend one of the next CG meetings (the next is June 9th) to present, that would be best. You can open a PR to add this as an agenda item: https://github.com/WebAssembly/meetings/tree/master/2020

@rossberg
Copy link
Member

rossberg commented May 28, 2020

Can we simplify that further to

dataval ::= string
            | '(' 'i8'  i8+ ')'
            | '(' 'i16' i16+ ')'
            | '(' 'i32' i32+ ')'
            | '(' 'i64' i64+ ')'
            | '(' 'f32' f32+ ')'
            | '(' 'f64' f64+ ')'

This mirrors what we did for SIMD values, and it avoids mistaking these value encodings for instructions. Also, I would tend to replace + with *, to be consistent with other n-ary constructs in the text format.

@echamudi
Copy link
Author

Looks fine with me. Also, I found out that the previous syntax (especially i32.const) can be confusing when we use the offset abbreviation.

(data (i32.const 10) (i32.const 20)) ;; same keyword, but 1st one is offset, 2nd one is data.

So, using @rossberg's suggested format might remove the problem.

(data (i32.const 10) (i32 20))

@rossberg
Copy link
Member

Yeah, in fact, t.const might even create an ambiguity for passive segments.

@binji
Copy link
Member

binji commented May 28, 2020

Ah, good points. This is looking very nice now!

@echamudi
Copy link
Author

echamudi commented Jun 9, 2020

Hey, I decided to fork and play with the wat2wasm code last weekend 😃. So, here's the sneak peek demo of this proposal: https://wasmprop-numerical-data.netlify.app/wat2wasm/ .

And here are the code changes: code diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants