Skip to content

preparation: DWARF Line sections#1872

Merged
ggreif merged 30 commits intomasterfrom
gabor/line-table
Sep 10, 2020
Merged

preparation: DWARF Line sections#1872
ggreif merged 30 commits intomasterfrom
gabor/line-table

Conversation

@ggreif
Copy link
Contributor

@ggreif ggreif commented Aug 25, 2020

Split out the line information related tables into a separate patch.

This won't yet produce statement boundaries, since the codegen is not inserting them yet into the AST.

Observed anomalies (fixing these only makes sense with the entire package) :

  • The main compilation unit's filename should be at position 0, currently it is at pos. 1. But no drawbacks are seen
  • wasmtime bug, phew! Sometimes when stepping out of a function, one finds her/himself in assembly land. This might be due to incorrect proepilog markers. (Adding a very early prolog-ending marker improved the situation, but it still occurs in other functions. This gives me the hope that placing the marker after the filling of certain locals might eliminate the issue. -- Turns out, the emission of functions' end instruction was messy.)
  • Similarly, when stopping on a breakpoint the argument appears as <unavailable>, but after stepping it appears. This is probably a prolog marker problem. I haven't seen this after placing prolog-ending marker after the locals.
  • 0x000000000000047e below appears twice, this could be the reason (fixed in 8b84b44)
0x000000000000047e      0      1      1   0             0 
0x000000000000047e      0      1      1   0             0  is_stmt epilogue_begin
0x000000000000047f      0      1      1   0             0  end_sequence

Progress:

  • as of 18d167e, the prolog/epilog marking should be reliable
  • but still not tracking the arg->local assignments and multi-value return glue
  • line machine state is a record now
  • eliminated quadratic blowup
  • QC tests are not slow anymore (was the quadratic runtime issue)

Needs to be done:

  • check all FIXMEs
  • build bigger programs

ggreif added 3 commits August 24, 2020 18:10
of the monolithic DWARF patch
this is responsible  for fulfilling the offset promises
@dfinity-ci
Copy link

In terms of gas, no changes are observed in 2 tests.
In terms of size, no changes are observed in 2 tests.

ggreif added 2 commits August 25, 2020 15:21
By mentioning all three implicit filenames, the unit's name will
appear at inndex 0
@ggreif ggreif marked this pull request as ready for review August 25, 2020 14:40
@ggreif ggreif changed the title Gabor/line table DWARF Line sections Aug 25, 2020
@ggreif ggreif requested review from crusso, nomeata and osa1 August 26, 2020 09:05
Copy link
Contributor

@osa1 osa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the section code line-by-line (that'd require a few weeks of studying DWARF) but added some inline comments based on the question of "what would be the questions I would ask if I had to work on this code".

Other questions:

  • I think the sections generated in this patch (debug_line and debug_line_str) are as explained in DWARF section 6.2, right? Would be good to mention this somewhere so that a reader will know where to look for the specification of the format.

  • In the PR description:

    Sometimes when stepping out of a function, one finds her/himself in assembly land

    Is thought this is fixed? Is this still a problem with this PR?

  • I wonder if there's an existing tool that can generate this information from the source maps?

let line_range = 7
let opcode_base = dw_LNS_set_isa

type state = int * (int * int * int) * int * (bool * bool * bool * bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use a record type here so that it'll be clear what is what. This has 5 ints and 4 bools with no names and no documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add an extended commentary, but seeing above sentiment I may give it a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might actually be clearer since you can use record punning, field ommission and punning - maybe

@ggreif ggreif mentioned this pull request Aug 27, 2020
7 tasks
@ggreif ggreif changed the title DWARF Line sections preparation: DWARF Line sections Aug 27, 2020
It is a leftover from early times.
ggreif added 2 commits August 27, 2020 12:48
This should be renamed
@crusso
Copy link
Contributor

crusso commented Aug 27, 2020

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

rel addr, (file', line, column + 1), 0, (stmt, false, false, false) in

let joining (prg, state) state' : int list * Dwarf5.Machine.state =
(* FIXME: quadratic *)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to use difference-lists here which would give constant-time concat, thus linear complexity for the fold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we can live with this, I hope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not too hard (can you just use an accumulator and reverse at the end?) it might be worth fixing this now - Looks like joining is done in a fold below - this could easily bite us later and might be hard to track down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflecting about this, I think a right-fold with prepending and seed [dw_LNS_advance_pc; 1; - dw_LNE_end_sequence] would exactly do the desired thing. Alas, there is no Seq. fold_right. I'll figure out something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 0e8b05a. I am happier this way.

@ggreif
Copy link
Contributor Author

ggreif commented Aug 31, 2020

@nomeata Your input is always welcome, but optional.

@ggreif
Copy link
Contributor Author

ggreif commented Aug 31, 2020

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

@crusso As seen in #1546, eliminating the old-style names section has negative effects on certain tools that also run from the CI. So I won't do that.

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

Your doubts about Meta instructions are not without merit, but the dead-code elimination problem is pretty well understood and mitigated (see is_dwarf_like) by now. Since we now control the Wasm AST, we can surely come back to your suggestion in the future, but it would be an overkill right now, IMHO.

@crusso
Copy link
Contributor

crusso commented Sep 4, 2020

So it looks like this code still supports the old sourcemap emission which was derived from the motoko source locations attached to wasm instructions.

@crusso As seen in #1546, eliminating the old-style names section has negative effects on certain tools that also run from the CI. So I won't do that.

I didn't mean the names sections, which is part of the wasm spec but the sourcemap itself. I actually don't wont to disable the latter since it may be useful for other tools that don't understand dwarf (e.g my old debugger but also V8/Firefox)

Does the DWARF format use that information too, or only the information in Meta instructions, or both? What added value do the various Meta instructions provide, since I guess they can get in the way of peephole optimization etc. I'm actually wondering if it would be better to put the dwarf information not in an extra instruction, but alongside every instruction like the existing source annotations - then the DWARF instructions wouldn't interfere with code opimization so easily.

Your doubts about Meta instructions are not without merit, but the dead-code elimination problem is pretty well understood and mitigated (see is_dwarf_like) by now. Since we now control the Wasm AST, we can surely come back to your suggestion in the future, but it would be an overkill right now, IMHO.

Ok, I'll defer to @nomeata's judgment on how this approach impacts the backend. I'm just talking from my experience with SML.NET where we also encoded the debug info as special instructions which just got in the way of the rest of the codegen. But if you've got something that works, we can revisit later, sure.

@nomeata
Copy link
Contributor

nomeata commented Sep 4, 2020

Revisiting later sounds reasonable

@ggreif
Copy link
Contributor Author

ggreif commented Sep 8, 2020

I didn't mean the names sections, which is part of the wasm spec but the sourcemap itself. I actually don't wont to disable the latter since it may be useful for other tools that don't understand dwarf (e.g my old debugger but also V8/Firefox)

@crusso My fault, I confused stuff. The sourcemap functionality (add_to_map and friends) is still there and we can eliminate it when we are confident that debuggers can cope with DWARF well. I am still in the dark about where (and how) exactly the sourcemap ends up in the .wasm file, but that is secondary.

@ggreif
Copy link
Contributor Author

ggreif commented Sep 8, 2020

  • I wonder if there's an existing tool that can generate this information from the source maps?

@osa1 I am not aware of any. Please note also that sourcemaps contain pro/epilogue information as well as statement and function boundaries. It also tracks redundancies due to inlining (what we don't have at present) and basic blocks (which I haven't tackled yet). So converting from sourcemaps to DWARF would be impoverished at best.

@nomeata
Copy link
Contributor

nomeata commented Sep 8, 2020

I am still in the dark about where (and how) exactly the sourcemap ends up in the .wasm file, but that is secondary.

Nowhere, it gets written out as a separate file

@ggreif ggreif requested review from crusso and osa1 September 8, 2020 17:26
rel addr, (file', line, column + 1), 0, (stmt, false, false, false) in

let joining (prg, state) state' : int list * Dwarf5.Machine.state =
(* FIXME: quadratic *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not too hard (can you just use an accumulator and reverse at the end?) it might be worth fixing this now - Looks like joining is done in a fold below - this could easily bite us later and might be hard to track down.

(write_opcodes u8 uleb128 sleb128 write32
Dwarf5.(prg
@ [dw_LNS_advance_pc; 1]
@ (if stmt then [dw_LNS_negate_stmt] else []) (* FIXME: actually irrelevant *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the end_sequence flag is present, all other flags are ignored. After all, it marks the IP after the last instruction of the sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by f007024.

data_section m.data;
(* other optional sections *)
name_section em.name;
if !Mo_config.Flags.debug_info then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

type instr_mode = Regular | Prologue | Epilogue

type state = { ip : int
; loc : int * int * int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess you could use a location record too {file;line;col} , but perhaps overkill. You decide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah. I thought about it, but it was late and I probably forgot. I'll see if that looks better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piece of cake: 46861f8.

| op :: tail when dw_LNS_negate_stmt = op -> if noisy then Printf.printf "~STMT\n"; standard op; chase tail
| op :: tail when dw_LNS_set_prologue_end = op -> if noisy then Printf.printf "<PRO\n"; standard op; chase tail
| op :: tail when dw_LNS_set_epilogue_begin = op -> if noisy then Printf.printf ">EPI\n"; standard op; chase tail
| op :: tail when - dw_LNE_end_sequence = op -> if noisy then Printf.printf "FIN\n"; extended1 op; chase tail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  | op :: tail when - dw_LNE_end_sequence = op -> if noisy then Printf.printf "FIN\n"; extended1 op; chase tail
                           ^ what does this negation do? Is it negation or some weird Ocaml pattern match extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is negation. It is an extra bit of information signifying extended opcode (these need to be written as several bytes). I'll add a comment explaining the scheme.

Copy link
Contributor Author

@ggreif ggreif Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it prevents an ambiguity between standard opcodes (DW_LNS_*) and extended ones (DW_LNE_*), which otherwise would share the same overlapping ranges. I'll try to hide the minus in a somewhat more subtle way, to reduce the WTF! effect.

Copy link
Contributor Author

@ggreif ggreif Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b9e60cf.

Copy link
Contributor

@crusso crusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It kinda hard to review with almost zero knowledge of DWARF. I'd fix the quadratic code if you can foresee it'll be an issue.

@ggreif
Copy link
Contributor Author

ggreif commented Sep 10, 2020

I'd fix the quadratic code if you can foresee it'll be an issue.

Done in 0e8b05a. It was an issue: the QuickCheck tests now run in 4 min (vs. 15 min before this change).

@ggreif
Copy link
Contributor Author

ggreif commented Sep 10, 2020

It kinda hard to review with almost zero knowledge of DWARF.

@osa1 @crusso Thanks anyway for the eyeballs, I think the code got better by a magnitude! Fortunately this is not a mission-critical stretch, so I am not worried too much that you haven't digested each and every line :-)

(write_opcodes u8 uleb128 sleb128 write32
Dwarf5.(prg @ [dw_LNS_advance_pc; 1; - dw_LNE_end_sequence]))
let prg0, _ = Seq.fold_left joining ([], start_state) states_seq in
let prg = List.fold_left (Fun.flip (@)) Dwarf5.[dw_LNS_advance_pc; 1; - dw_LNE_end_sequence] prg0 in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this still pretty inefficient? I expect there's more gains here.

Copy link
Contributor Author

@ggreif ggreif Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see how: try

fold_left (flip (@)) [x; y; z] [[h; i]; [d; e; f; g]; [a; b; c]]
--> [a; b; c; d; e; f; g; h; i; x; y; z]

Copy link
Contributor Author

@ggreif ggreif Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to fuse the two lines, but the asymptotics are right (i.e. O(n) with n = length result).

@ggreif ggreif merged commit 436cd44 into master Sep 10, 2020
@ggreif ggreif deleted the gabor/line-table branch September 10, 2020 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants