Skip to content

Latest commit

 

History

History
158 lines (104 loc) · 7.7 KB

CG-11-23.md

File metadata and controls

158 lines (104 loc) · 7.7 KB

WebAssembly logo

Agenda for the November 23rd video call of WebAssembly's Community Group

  • Where: zoom.us
  • When: November 23rd, 5pm-6pm UTC (November 23rd, 9am-10am Pacific Time)
  • Location: link on calendar invite

Registration

None required if you've attended before. Send an email to the acting WebAssembly CG chair to sign up if it's your first time. The meeting is open to CG members only.

Logistics

The meeting will be on a zoom.us video conference. Installation is required, see the calendar invite.

Agenda items

  1. Opening, welcome and roll call
    1. Opening of the meeting
    2. Introduction of attendees
  2. Find volunteers for note taking (acting chair to volunteer)
  3. Adoption of the agenda
  4. Proposals and discussions
    1. Interpreters (30 min) - Ben Titzer (slides)
    2. Discussion: moving tail calls forward (issue) (15 min) - Thomas Lively
  5. Closure

Meeting Notes

Attendees

  • Ben Titzer
  • Francis McCabe
  • Paolo Severini
  • Sean Jensen-Grey
  • Derek Schuff
  • Arun Purushan
  • Slava Kuzmich
  • Gordon A
  • Yuri Iozzelli
  • Daniel Hillerström
  • Zalim Bashorov
  • Sergey Rubanov
  • Sam Lindley
  • Luke Wagner
  • Heejin Ahn
  • Andrew Brown
  • Ryan Hunt
  • Radu Matei
  • Nabeel Al-Shamma
  • Keith Winstein
  • Adam Klein
  • Hannes Pyer
  • Jakob Kummerow
  • Johnnie Birch
  • Asumu Takikawa
  • Alon Zakai
  • Ethan Lee
  • Petr Panzin
  • Intel Seven
  • Pat Hickey
  • Ioanna Dimitriou
  • OCamlPro
  • Andreas Rossberg
  • Ross Tate
  • Bailey Hayes
  • Steven Prine
  • Sam Clegg
  • Zhi An Ng

Interpreters - Ben Titzer

(slides)

AZ: you said that translating to AST/IR takes more space. Can you just rewrite and then discard the original wasm?

BT: to debug, you need to refer to the original bytecode.

AZ: debugging isn’t something you do all the time though?

BT: it’s not really a niche. You do it rarely but it matters. One other thing is instrumentation. For that you often refer back to the bytecode. Wizard is designed in particular for debugging and instrumentation

FM: One issue with guard page is that it’s not consistent with stack switching

BT: there are definitely issues there, where do you put the branch, where to put the pages, etc. haven’t solved that yet.

VM: V8 uses guard page but also has a stack chekc on function entry.

BT: I don’t think for the C execution stack. The check is explicit in the code. It’s actually fairly unreliable. For Wizard I implemented that for Virgil to cover virgil code too

PS : What is the performance of the best interpreters compared to jitted code? Is it sufficient to run "normal" Wasm workloads like Facebook games? Google Earth? What is the threshold for the kind of apps that work with an interpreter?

BT: somewhere between 10-50x (mostly more toward 10x). Every interpreter will be 10x compared to JIT. JSC is 10-15x slower than baseline

AR: without the tagging is the value stack only 32 bits wide? How did yo utag 64 bit vals?

BT: 2 64 bit worlds. The first word is only a byte, but full for alignment. On x86 unaligned accesses are pretty fast, not sure it matters, but it makes the math easier

AR: I assume no SIMD yet?

BT: not yet. For 128 bit values there are a couple options (next slide) You can box them (5-10x slowdown). You can try to reuse boxes, but that gets a bit crazy. Since I”m using 128 bit slots, maybe they are untagged for SIMD. Then you need conservative GC. Or you go to 256 bit value stack slots (that’s probably not to bad, 5-10% cost). I’ll probably do that.

AR: could you split the stack per type somehow?

BT: I’ve thought about that, but it doesn't really work. There are polymorphic bytecodes, local, drop etc. There are things you can do but they aren’t really better than tagging

FM: Do you still think it’s worth having an interpreter?

BT: yes. The way I see things going is that production engines will have 3 tiers unless they are for offline compilation

FM : what do you think about V8s approach wrt debugging in liftoff?

BT: V8 started by doing JS debugging in its baseline tier. THere are a lot of issues with debugging JITed code. Patching code, etc. It’s a hassle and a security issue. It’s just so much simpler.

FM: what liftoff does is if you turn on debugging it throws away the JITted code and recompiles

BT: yeah, same as FullCodeGen in JS. it made things more complicated with an extra mode. It caused a lot of bugs there. We wanted to avoid putting debug info where we could get away with it. It just caused a lot of bugs. An interpreter is dead simple to get right.

FM: JS now has an additional tier (sparkplug). Why have one compiler when 2 will do the job? :)

BT: there are performance issues with just an interp + top tier, and debugging issues with just a baseline and opt. I think that is part of what leads engines to do that.

PS: You said this is useful as first tier, but good for security to avoid JIT? Is that perf good enough for some usages?

BT: I guess it depends on your application. If we had launched wasm from the beginning at 10x slower than native we wouldn't be here today. Sometimes you really need that. Other things aren’t super important. E.g. if you embed wasm in your program to use for lightweight scripting, e.g. they aren’t performance critical, interpreter is probably fine.

FM: the other approach to debugging is DWARF-style, codemap approach, so you don’t try to keep the original, you just keep pointers to the original source. For apps in C or OCaml, you don’t really care about the wasm bytecdode, just the application source. So the use case for debugging wasm seems limited

BT: if your static compiler generates source mapping, the target of those is going to be wasm code offsets. For any request to the engine, it will be wasm code offsets. So the wasm engine needs to tell the source level debugger where it is in terms of wasm bytes. So it needs to reverse the mapping.

FM: thats not what you want though, you want it direct. Could you support it more directly? You could have an opcode whose job is to record the offset?

Tail calls

TL: discussions on emscripten repo got us thinking about this again (issue 15). The C++ toolchain is ready to use, there’s an implementation in chrome. All that’s missing is a 2nd engine implementation (and this issue is only the 15th ever). It doesn’t look like consensus on the design will be an issue. There’s good discussion about it on the issue; a lot of people chimed in with different use cases: 6 just on that one thread (lumen/erlang, asterias/haskell, CakeML, CheerpX, OCaml, C++ coroutines and general TCO). Lars said it’s basically a matter of prioritization to see that it’s worth spending time on. So I wanted to put out a public call if you have other use cases for tail calls, please let us know. And if you’re in a position to run some benchmarks to prove the performance benefits, that would be super helpful as well (details on how to turn it on are in the issue too).

AR: I’d point out that it’s not just about benchmarks. There are things you just can’t do without tail calls. YOu can’t implement e.g. trampolines in JS.

TL: yes definitely cases where there’s no good workaround are good to hear about too.

BT: Wizard does implement it. Does that count?

TL: we still have the requirement that the engines be web engines

AR: what about wasmtime, does that have support yet?

PH: wasmtime doesn’t have it yet, we haven’t really discussed it. We can do that.