preparation: DWARF Line sections by ggreif · Pull Request #1872 · caffeinelabs/motoko

ggreif · 2020-08-25T07:30:50Z

Split out the line information related tables into a separate patch.

This won't yet produce statement boundaries, since the codegen is not inserting them yet into the AST.

Observed anomalies (fixing these only makes sense with the entire package) :

~~The main compilation unit's filename should be at position 0, currently it is at pos. 1. But no drawbacks are seen~~
wasmtime bug, phew! Sometimes when stepping out of a function, one finds her/himself in assembly land. This might be due to incorrect ~~pro~~epilog markers. (Adding a very early prolog-ending marker improved the situation, but it still occurs in other functions. This gives me the hope that placing the marker after the filling of certain locals might eliminate the issue. -- Turns out, the emission of functions' end instruction was messy.)
~~Similarly, when stopping on a breakpoint the argument appears as <unavailable>, but after stepping it appears. This is probably a prolog marker problem.~~ I haven't seen this after placing prolog-ending marker after the locals.
~~0x000000000000047e below appears twice, this could be the reason~~ (fixed in 8b84b44)

0x000000000000047e      0      1      1   0             0 
0x000000000000047e      0      1      1   0             0  is_stmt epilogue_begin
0x000000000000047f      0      1      1   0             0  end_sequence

Progress:

as of 18d167e, the prolog/epilog marking should be reliable
but still not tracking the arg->local assignments and multi-value return glue
line machine state is a record now
eliminated quadratic blowup
QC tests are not slow anymore (was the quadratic runtime issue)

Needs to be done:

check all FIXMEs
build bigger programs

of the monolithic DWARF patch

this is responsible for fulfilling the offset promises

dfinity-ci · 2020-08-25T07:33:45Z

In terms of gas, no changes are observed in 2 tests.
In terms of size, no changes are observed in 2 tests.

By mentioning all three implicit filenames, the unit's name will appear at inndex 0

osa1

I didn't review the section code line-by-line (that'd require a few weeks of studying DWARF) but added some inline comments based on the question of "what would be the questions I would ask if I had to work on this code".

Other questions:

I think the sections generated in this patch (debug_line and debug_line_str) are as explained in DWARF section 6.2, right? Would be good to mention this somewhere so that a reader will know where to look for the specification of the format.
In the PR description:

Sometimes when stepping out of a function, one finds her/himself in assembly land

Is thought this is fixed? Is this still a problem with this PR?
I wonder if there's an existing tool that can generate this information from the source maps?

src/wasm-exts/customModuleEncode.ml

osa1 · 2020-08-26T09:39:30Z

src/wasm-exts/dwarf5.ml

+let line_range = 7
+let opcode_base = dw_LNS_set_isa
+
+type state = int * (int * int * int) * int * (bool * bool * bool * bool)


Would it make sense to use a record type here so that it'll be clear what is what. This has 5 ints and 4 bools with no names and no documentation.

Yeah, I agree.

I tried to add an extended commentary, but seeing above sentiment I may give it a try.

I might actually be clearer since you can use record punning, field ommission and punning - maybe

src/wasm-exts/dwarf5.ml

It is a leftover from early times.

src/wasm-exts/dwarf5.ml

This should be renamed

src/wasm-exts/dwarf5.ml

src/wasm-exts/customModuleEncode.ml

crusso · 2020-08-27T13:32:19Z

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

ggreif · 2020-08-27T21:22:49Z

src/wasm-exts/customModuleEncode.ml

+              rel addr, (file', line, column + 1), 0, (stmt, false, false, false) in
+
+            let joining (prg, state) state' : int list * Dwarf5.Machine.state =
+              (* FIXME: quadratic *)


It should be possible to use difference-lists here which would give constant-time concat, thus linear complexity for the fold.

For now we can live with this, I hope.

If not too hard (can you just use an accumulator and reverse at the end?) it might be worth fixing this now - Looks like joining is done in a fold below - this could easily bite us later and might be hard to track down.

Reflecting about this, I think a right-fold with prepending and seed [dw_LNS_advance_pc; 1; - dw_LNE_end_sequence] would exactly do the desired thing. Alas, there is no Seq. fold_right. I'll figure out something.

See 0e8b05a. I am happier this way.

src/wasm-exts/dwarf5.ml

src/wasm-exts/customModuleEncode.ml

ggreif · 2020-08-31T14:58:03Z

@nomeata Your input is always welcome, but optional.

review feedback

ggreif · 2020-08-31T15:15:24Z

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

@crusso As seen in #1546, eliminating the old-style names section has negative effects on certain tools that also run from the CI. So I won't do that.

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

Your doubts about Meta instructions are not without merit, but the dead-code elimination problem is pretty well understood and mitigated (see is_dwarf_like) by now. Since we now control the Wasm AST, we can surely come back to your suggestion in the future, but it would be an overkill right now, IMHO.

crusso · 2020-09-04T10:07:20Z

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

@crusso As seen in #1546, eliminating the old-style names section has negative effects on certain tools that also run from the CI. So I won't do that.

I didn't mean the names sections, which is part of the wasm spec but the sourcemap itself. I actually don't wont to disable the latter since it may be useful for other tools that don't understand dwarf (e.g my old debugger but also V8/Firefox)

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

Your doubts about Meta instructions are not without merit, but the dead-code elimination problem is pretty well understood and mitigated (see is_dwarf_like) by now. Since we now control the Wasm AST, we can surely come back to your suggestion in the future, but it would be an overkill right now, IMHO.

Ok, I'll defer to @nomeata's judgment on how this approach impacts the backend. I'm just talking from my experience with SML.NET where we also encoded the debug info as special instructions which just got in the way of the rest of the codegen. But if you've got something that works, we can revisit later, sure.

nomeata · 2020-09-04T12:38:25Z

Revisiting later sounds reasonable

ggreif · 2020-09-08T09:57:29Z

I didn't mean the names sections, which is part of the wasm spec but the sourcemap itself. I actually don't wont to disable the latter since it may be useful for other tools that don't understand dwarf (e.g my old debugger but also V8/Firefox)

@crusso My fault, I confused stuff. The sourcemap functionality (add_to_map and friends) is still there and we can eliminate it when we are confident that debuggers can cope with DWARF well. I am still in the dark about where (and how) exactly the sourcemap ends up in the .wasm file, but that is secondary.

ggreif · 2020-09-08T12:11:39Z

I wonder if there's an existing tool that can generate this information from the source maps?

@osa1 I am not aware of any. Please note also that sourcemaps contain pro/epilogue information as well as statement and function boundaries. It also tracks redundancies due to inlining (what we don't have at present) and basic blocks (which I haven't tackled yet). So converting from sourcemaps to DWARF would be impoverished at best.

nomeata · 2020-09-08T12:27:02Z

I am still in the dark about where (and how) exactly the sourcemap ends up in the .wasm file, but that is secondary.

Nowhere, it gets written out as a separate file

looks better actually

src/wasm-exts/customModuleEncode.ml

crusso · 2020-09-09T13:12:58Z

src/wasm-exts/customModuleEncode.ml

+              rel addr, (file', line, column + 1), 0, (stmt, false, false, false) in
+
+            let joining (prg, state) state' : int list * Dwarf5.Machine.state =
+              (* FIXME: quadratic *)


If not too hard (can you just use an accumulator and reverse at the end?) it might be worth fixing this now - Looks like joining is done in a fold below - this could easily bite us later and might be hard to track down.

crusso · 2020-09-09T13:13:46Z

src/wasm-exts/customModuleEncode.ml

+              (write_opcodes u8 uleb128 sleb128 write32
+                 Dwarf5.(prg
+                         @ [dw_LNS_advance_pc; 1]
+                         @ (if stmt then [dw_LNS_negate_stmt] else []) (* FIXME: actually irrelevant *)


When the end_sequence flag is present, all other flags are ignored. After all, it marks the IP after the last instruction of the sequence.

Done by f007024.

crusso · 2020-09-09T13:14:23Z

src/wasm-exts/customModuleEncode.ml

      data_section m.data;
      (* other optional sections *)
      name_section em.name;
+      if !Mo_config.Flags.debug_info then


crusso · 2020-09-09T13:16:42Z

src/wasm-exts/dwarf5.ml

+type instr_mode = Regular | Prologue | Epilogue
+
+type state = { ip : int
+             ; loc : int * int * int


Guess you could use a location record too {file;line;col} , but perhaps overkill. You decide.

Yah. I thought about it, but it was late and I probably forgot. I'll see if that looks better.

Piece of cake: 46861f8.

crusso · 2020-09-09T13:53:53Z

src/wasm-exts/dwarf5.ml

+  | op :: tail when dw_LNS_negate_stmt = op -> if noisy then Printf.printf "~STMT\n"; standard op; chase tail
+  | op :: tail when dw_LNS_set_prologue_end = op -> if noisy then Printf.printf "<PRO\n"; standard op; chase tail
+  | op :: tail when dw_LNS_set_epilogue_begin = op -> if noisy then Printf.printf ">EPI\n"; standard op; chase tail
+  | op :: tail when - dw_LNE_end_sequence = op -> if noisy then Printf.printf "FIN\n"; extended1 op; chase tail


| op :: tail when - dw_LNE_end_sequence = op -> if noisy then Printf.printf "FIN\n"; extended1 op; chase tail ^ what does this negation do? Is it negation or some weird Ocaml pattern match extension?

It is negation. It is an extra bit of information signifying extended opcode (these need to be written as several bytes). I'll add a comment explaining the scheme.

Actually it prevents an ambiguity between standard opcodes (DW_LNS_*) and extended ones (DW_LNE_*), which otherwise would share the same overlapping ranges. I'll try to hide the minus in a somewhat more subtle way, to reduce the WTF! effect.

Done in b9e60cf.

crusso

It kinda hard to review with almost zero knowledge of DWARF. I'd fix the quadratic code if you can foresee it'll be an issue.

review comment

Co-authored-by: Claudio Russo <claudio@dfinity.org>

ggreif · 2020-09-10T10:39:14Z

I'd fix the quadratic code if you can foresee it'll be an issue.

Done in 0e8b05a. It was an issue: the QuickCheck tests now run in 4 min (vs. 15 min before this change).

hiding it in one module

ggreif · 2020-09-10T11:31:25Z

It kinda hard to review with almost zero knowledge of DWARF.

@osa1 @crusso Thanks anyway for the eyeballs, I think the code got better by a magnitude! Fortunately this is not a mission-critical stretch, so I am not worried too much that you haven't digested each and every line :-)

crusso · 2020-09-10T11:41:07Z

src/wasm-exts/customModuleEncode.ml

-              (write_opcodes u8 uleb128 sleb128 write32
-                 Dwarf5.(prg @ [dw_LNS_advance_pc; 1; - dw_LNE_end_sequence]))
+              let prg0, _ = Seq.fold_left joining ([], start_state) states_seq in
+              let prg = List.fold_left (Fun.flip (@)) Dwarf5.[dw_LNS_advance_pc; 1; - dw_LNE_end_sequence] prg0 in


Isn't this still pretty inefficient? I expect there's more gains here.

I can't see how: try

fold_left (flip (@)) [x; y; z] [[h; i]; [d; e; f; g]; [a; b; c]] --> [a; b; c; d; e; f; g; h; i; x; y; z]

It would be interesting to fuse the two lines, but the asymptotics are right (i.e. O(n) with n = length result).

ggreif added 3 commits August 24, 2020 18:10

split out line-machine related parts

0870f66

of the monolithic DWARF patch

only bother writing line section when essential

84b2cde

add debug_line_str_section too

1ac2560

this is responsible for fulfilling the offset promises

ggreif added 2 commits August 25, 2020 15:21

cleanups and comments

fb1e9e4

improve file table such that file_names[0] is not prelude

5312a5d

By mentioning all three implicit filenames, the unit's name will appear at inndex 0

ggreif marked this pull request as ready for review August 25, 2020 14:40

ggreif changed the title ~~Gabor/line table~~ DWARF Line sections Aug 25, 2020

ggreif requested review from crusso, nomeata and osa1 August 26, 2020 09:05

osa1 reviewed Aug 26, 2020

View reviewed changes

ggreif mentioned this pull request Aug 27, 2020

Step-through debugging #1244

Closed

7 tasks

ggreif changed the title ~~DWARF Line sections~~ preparation: DWARF Line sections Aug 27, 2020

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/dwarf5.ml Outdated Show resolved Hide resolved

remove function interpret

4dd190e

It is a leftover from early times.

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/dwarf5.ml Outdated Show resolved Hide resolved

ggreif added 2 commits August 27, 2020 12:48

explain infer

4b05718

comment moves

f1df004

This should be renamed

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/dwarf5.ml Outdated Show resolved Hide resolved

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/dwarf5.ml Outdated Show resolved Hide resolved

ggreif added 2 commits August 27, 2020 13:36

comments for module Location

5a352c3

Explain line machine state

3d28fe6

crusso reviewed Aug 27, 2020

View reviewed changes

src/wasm-exts/customModuleEncode.ml Outdated Show resolved Hide resolved

crusso reviewed Aug 27, 2020

View reviewed changes

src/wasm-exts/customModuleEncode.ml Outdated Show resolved Hide resolved

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/dwarf5.ml Outdated Show resolved Hide resolved

ggreif commented Aug 27, 2020

View reviewed changes

src/wasm-exts/customModuleEncode.ml Outdated Show resolved Hide resolved

Update src/wasm-exts/customModuleEncode.ml

c3d6c94

ggreif added 2 commits August 31, 2020 17:02

remove printfs

5ca84e0

rename function to write_opcodes

92c2426

review feedback

add a few pointers into the DWARF5 document

cb67f3f

ggreif added 2 commits September 8, 2020 18:36

implement infer' that uses record syntax

febcfdb

looks better actually

switch over to record-based line machine state

9253305

ggreif requested review from crusso and osa1 September 8, 2020 17:26

crusso reviewed Sep 9, 2020

View reviewed changes

crusso approved these changes Sep 9, 2020

View reviewed changes

ggreif force-pushed the gabor/line-table branch from 9253305 to 31e1bcd Compare September 10, 2020 09:03

remove pointless stmt manip

f007024

review comment

ggreif force-pushed the gabor/line-table branch from 31e1bcd to f007024 Compare September 10, 2020 09:22

ggreif and others added 3 commits September 10, 2020 12:29

remove a fixme by being careful not to intro quadratic runtime

0e8b05a

add comment to explain remaining tuple

cc12d57

Update src/wasm-exts/customModuleEncode.ml

76440c4

Co-authored-by: Claudio Russo <claudio@dfinity.org>

ggreif added 2 commits September 10, 2020 12:42

deal with the overlapping LNS/LNE ranges in a little more subtle way

b9e60cf

hiding it in one module

make DWARF locs cute too

46861f8

Merge branch 'master' into gabor/line-table

51b86e2

crusso reviewed Sep 10, 2020

View reviewed changes

ggreif merged commit 436cd44 into master Sep 10, 2020

ggreif deleted the gabor/line-table branch September 10, 2020 11:41

Conversation

ggreif commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfinity-ci commented Aug 25, 2020

Uh oh!

osa1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crusso commented Aug 27, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ggreif commented Aug 31, 2020

Uh oh!

ggreif commented Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusso commented Sep 4, 2020

Uh oh!

nomeata commented Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggreif commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggreif commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nomeata commented Sep 8, 2020 • edited by ggreif Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ggreif commented Aug 25, 2020 •

edited

Loading

ggreif commented Aug 31, 2020 •

edited

Loading

nomeata commented Sep 4, 2020 •

edited

Loading

ggreif commented Sep 8, 2020 •

edited

Loading

ggreif commented Sep 8, 2020 •

edited

Loading

nomeata commented Sep 8, 2020 •

edited by ggreif

Loading

ggreif Sep 10, 2020 •

edited

Loading

ggreif Sep 10, 2020 •

edited

Loading

ggreif commented Sep 10, 2020 •

edited

Loading

ggreif Sep 10, 2020 •

edited

Loading

ggreif Sep 10, 2020 •

edited

Loading