Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 45 additions & 14 deletions design/TmpWireFormat.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ It also does not support all features that we want to support eventually. In
particular, it does not support subtyping.

Some types have a *specialized argument format* when used directly as a
function arguments, rather than nested inside a data structure. All others use
the *general argument format*.
function arguments, rather than nested inside a data structure. Other types use
the _general argument format (without references)_ or the _general argument
format (with references)_.

Each argument of a function is serialized separately. If the function is
defined with a list of arguments, these all become arguments of the WebAssembly
Expand All @@ -36,16 +37,13 @@ Note that there is no terminating `\0`, and the length is implicit as the
length of the `databuf`.


General argument format
-----------------------
General argument format (without references)
--------------------------------------------

All other arguments are represented as a non-empty `elembuf` where

* the first entry is a `databuf` containing the actual data (see below)
* all further entries are the references contained in the data.

The `databuf` is generated by an in-order traversal of the data type.
All numbers are fixed-width and in little endian format.
Arguments with a type that does not mention any reference types (no actors, no
shared functions), are represented as a `databuf`. This `databuf` is generated
by an in-order traversal of the data type. All numbers are fixed-width and in
little endian format.

* A `Nat`, `Int` or `Word64` is represented by 8 bytes.
* A `Word32` is represented by 4 bytes.
Expand All @@ -63,9 +61,42 @@ All numbers are fixed-width and in little endian format.
are statically known.)
* An `Option` is represented by a single byte `0` if it is `null`, or
otherwise by a single byte `1` followed by the representation of the value
* A reference (`actor`, `shared func`) is represented as a 32 bit number (4
bytes) that is an index into the surrounding `elembuf`. This is never `0`, as
the first entry in the `elembuf` is the `databuf` with the actual data.
* An empty tuple, the type `Null` and the type `Shared` are represented by
zero bytes.


*Example:* The ActorScript value
```
(null, ?4, "!") : (?Text, ?Int, Text)
```
is represented as
```
00 01 04 00 00 00 00 00 00 00 01 21
```

General argument format (with references)
-----------------------------------------

Argument with a type that mentions reference types (actors or shared functions)
are represented as an `elembuf`:

* the first entry is a `databuf` contains the data according to the format
above.
* all further entries are the references contained in the data.

The above format is thus extended with the following case:

* A reference (`actor`, `shared func`) is represented as a 32 bit number (4
bytes). Thus number is an index into the surrounding `elembuf`.

NB: The index is never never `0`, as the first entry in the `elembuf` is the
`databuf` with the actual data.

*Example:* The ActorScript value
```
(null, ?console) : (?actor {}, ?actor {log : Text -> () })
```
is represented as
```
elembuf [databuf [00 01 01 00 00 00], console]
```
105 changes: 79 additions & 26 deletions src/compile.ml
Original file line number Diff line number Diff line change
Expand Up @@ -2458,6 +2458,40 @@ module Serialization = struct
let typ_id : Type.typ -> string = Type.string_of_typ



(* Checks whether the serialization of a given type could contain references *)
module TS = Set.Make (struct type t = Type.typ let compare = compare end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gonna yield quadratic worst case, since every lookup potentially does a complete walk of the type subgraph. (In fact, with Def kinds, can't that even be cyclic and compare thus diverge?) I suggest only using a map of con's that you have already visited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cargo-culted this from rel_typ. What am I missing that is different there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aw, right. Yeah, that originally used physical equality, but that wasn't good enough due to opening binders. A con map actually also isn't enough, since you have to reduce.

Okay, eventually we might need more efficient representations/algorithms, but I guess you can keep it that way for now.

let has_no_references : Type.typ -> bool = fun t ->
let open Type in
let seen = ref TS.empty in (* break the cycles *)
let rec go t =
TS.mem t !seen ||
begin
seen := TS.add t !seen;
match t with
| Var _ -> assert false
| (Prim _ | Any | Non | Shared | Pre) -> true
| Con (c, ts) ->
begin match Con.kind c with
| Abs _ -> assert false
| Def (tbs,t) -> go (open_ ts t) (* TBR this may fail to terminate *)
end
| Array t -> go t
| Tup ts -> List.for_all go ts
| Func (Sharable, c, tbs, ts1, ts2) -> false
| Func (s, c, tbs, ts1, ts2) ->
let ts = open_binds tbs in
List.for_all go (List.map (open_ ts) ts1) &&
List.for_all go (List.map (open_ ts) ts2)
| Opt t -> go t
| Async t -> go t
| Obj (Actor, fs) -> false
| Obj (s, fs) -> List.for_all (fun f -> go f.typ) fs
| Mut t -> go t
| Serialized t -> go t
end
in go t

(* Returns data (in bytes) and reference buffer size (in entries) needed *)
let rec buffer_size env t =
let open Type in
Expand Down Expand Up @@ -2830,10 +2864,20 @@ module Serialization = struct
G.i (Call (nr (Dfinity.data_externalize_i env))) ^^
store_unskewed_ptr ^^

(* Finally, create elembuf *)
get_refs_start ^^
get_refs_size ^^ compile_add_const 1l ^^
G.i (Call (nr (Dfinity.elem_externalize_i env)))
if has_no_references t
then
(* Sanity check: Really no references *)
get_refs_size ^^
G.i (Test (Wasm.Values.I32 I32Op.Eqz)) ^^
G.if_ (ValBlockType None) G.nop (G.i Unreachable) ^^
(* If there are no references, just return the databuf *)
get_refs_start ^^
load_unskewed_ptr
else
(* Finally, create elembuf *)
get_refs_start ^^
get_refs_size ^^ compile_add_const 1l ^^
G.i (Call (nr (Dfinity.elem_externalize_i env)))
)

let deserialize_text env get_databuf =
Expand Down Expand Up @@ -2871,28 +2915,36 @@ module Serialization = struct
let (set_refs_start, get_refs_start) = new_local env "refs_start" in
let (set_databuf, get_databuf) = new_local env "databuf" in

(* Allocate space for the elem buffer *)
get_elembuf ^^
G.i (Call (nr (Dfinity.elem_length_i env))) ^^
set_refs_size ^^

get_refs_size ^^
Array.alloc env ^^
compile_add_const Array.header_size ^^
compile_add_const ptr_unskew ^^
set_refs_start ^^

(* Copy elembuf *)
get_refs_start ^^
get_refs_size ^^
get_elembuf ^^
compile_unboxed_const 0l ^^
G.i (Call (nr (Dfinity.elem_internalize_i env))) ^^

(* Get databuf *)
get_refs_start ^^
load_unskewed_ptr ^^
set_databuf ^^
begin
if has_no_references t
then
(* We have no elembuf wrapper, so the argument is the databuf *)
compile_unboxed_const 0l ^^ set_refs_start ^^
get_elembuf ^^ set_databuf
else
(* Allocate space for the elem buffer *)
get_elembuf ^^
G.i (Call (nr (Dfinity.elem_length_i env))) ^^
set_refs_size ^^

get_refs_size ^^
Array.alloc env ^^
compile_add_const Array.header_size ^^
compile_add_const ptr_unskew ^^
set_refs_start ^^

(* Copy elembuf *)
get_refs_start ^^
get_refs_size ^^
get_elembuf ^^
compile_unboxed_const 0l ^^
G.i (Call (nr (Dfinity.elem_internalize_i env))) ^^

(* Get databuf *)
get_refs_start ^^
load_unskewed_ptr ^^
set_databuf
end ^^

(* Allocate space for the data buffer *)
get_databuf ^^
Expand Down Expand Up @@ -2925,6 +2977,7 @@ module Serialization = struct
| Type.Prim Type.Text -> CustomSections.DataBuf
| Type.Prim Type.Word32 -> CustomSections.I32
| Type.Obj (Type.Actor, _) -> CustomSections.ActorRef
| t' when has_no_references t' -> CustomSections.DataBuf
| _ -> CustomSections.ElemBuf

end (* Serialization *)
Expand Down
54 changes: 52 additions & 2 deletions test/run-dfinity/ok/nary-async.wasm.stderr.ok
Original file line number Diff line number Diff line change
@@ -1,3 +1,53 @@
deserialize: T/77
serialize: T/77
buffer_size: T/77
prelude:103.1-128.2: internal error, File "compile.ml", line 2476, characters 21-27: Assertion failed

Last environment:
@new_async = func
Array_init = func
Array_tabulate = func
abs = func
btstWord16 = func
btstWord32 = func
btstWord64 = func
btstWord8 = func
charToWord32 = func
clzWord16 = func
clzWord32 = func
clzWord64 = func
clzWord8 = func
ctzWord16 = func
ctzWord32 = func
ctzWord64 = func
ctzWord8 = func
hashInt = func
ignore = func
intToWord16 = func
intToWord32 = func
intToWord64 = func
intToWord8 = func
natToWord16 = func
natToWord32 = func
natToWord64 = func
natToWord8 = func
popcntWord16 = func
popcntWord32 = func
popcntWord64 = func
popcntWord8 = func
print = func
printInt = func
range = func
revrange = func
shrsWord16 = func
shrsWord32 = func
shrsWord64 = func
shrsWord8 = func
word16ToInt = func
word16ToNat = func
word32ToChar = func
word32ToInt = func
word32ToNat = func
word64ToInt = func
word64ToNat = func
word8ToInt = func
word8ToNat = func