Skip to content

Remove op_index and end_sequence from Crystal::DWARF::LineNumbers::Row#15538

Merged
straight-shoota merged 1 commit intocrystal-lang:masterfrom
HertzDevil:perf/dwarf-linenumbers-row
Mar 4, 2025
Merged

Remove op_index and end_sequence from Crystal::DWARF::LineNumbers::Row#15538
straight-shoota merged 1 commit intocrystal-lang:masterfrom
HertzDevil:perf/dwarf-linenumbers-row

Conversation

@HertzDevil
Copy link
Contributor

Although these two fields are part of the DWARF line numbers state machine, they have no use when obtaining the line number for a given PC address. Removing them reduces sizeof(Crystal::DWARF::LineNumbers::Row) from 40 to 24 bytes on 64-bit targets, and from 28 to 20 bytes on 32-bit targets. As it turns out, Crystal::DWARF::LineNumbers#@matrix is one of the largest contributors to heap consumption in the Crystal runtime. We could check it by calculating the difference in GC.stats.total_bytes immediately before and after Exception::CallStack.load_debug_info_impl is called:

struct Exception::CallStack
  def self.load_debug_info : Nil
    return if ENV["CRYSTAL_LOAD_DEBUG_INFO"]? == "0"

    unless @@dwarf_loaded
      @@dwarf_loaded = true
      begin
        old = GC.stats.total_bytes
        load_debug_info_impl
        STDERR.puts(GC.stats.total_bytes - old)
      rescue ex
        @@dwarf_line_numbers = nil
        @@dwarf_function_names = nil
        Crystal::System.print_exception "Unable to load dwarf information", ex
      end
    end
  end
end

Using CRYSTAL_LOAD_DEBUG_INFO=1, the values are:

  • x86_64-windows-gnu, empty file: 7,677,008 -> 5,793,856 (-24.5%)
  • x86_64-linux-gnu, empty file: 5,021,184 -> 3,607,840 (-28.1%)
  • aarch64-apple-dawrin, empty file: 4,844,144 -> 3,453,024 (-28.7%)
  • x86_64-windows-gnu, compiler: 190,008,704 -> 127,562,288 (-32.9%)
  • x86_64-linux-gnu, compiler: 171,804,448 -> 110,186,544 (-35.9%)
  • aarch64-apple-dawrin, compiler: 172,521,040 -> 110,276,752 (-36.1%)

We could also profile GC.stats.heap_size in the same way: (these values are less reliable because the GC might simply decide to skip a GC cycle on some runs)

  • x86_64-windows-gnu, empty file: 3,858,432 -> 2,863,104 (-25.8%)
  • x86_64-linux-gnu, empty file: 3,235,840 -> 2,285,568 (-29.4%)
  • aarch64-apple-dawrin, empty file: 3,325,952 -> 2,392,064 (-28.1%)
  • x86_64-windows-gnu, compiler: 76,259,328 -> 58,658,816 (-23.1%)
  • x86_64-linux-gnu, compiler: 68,206,592 -> 44,556,288 (-34.7%)
  • aarch64-apple-dawrin, compiler: 59,293,696 -> 38,764,544 (-34.6%)

@straight-shoota straight-shoota added this to the 1.16.0 milestone Mar 3, 2025
@straight-shoota straight-shoota merged commit a1b2d3b into crystal-lang:master Mar 4, 2025
35 checks passed
@HertzDevil HertzDevil deleted the perf/dwarf-linenumbers-row branch March 6, 2025 23:17
Blacksmoke16 pushed a commit to Blacksmoke16/crystal that referenced this pull request Mar 18, 2025
…s::Row` (crystal-lang#15538)

Although these two fields are part of the DWARF line numbers state machine, they have no use when obtaining the line number for a given PC address. Removing them reduces `sizeof(Crystal::DWARF::LineNumbers::Row)` from 40 to 24 bytes on 64-bit targets, and from 28 to 20 bytes on 32-bit targets. As it turns out, `Crystal::DWARF::LineNumbers#@matrix` is one of the largest contributors to heap consumption in the Crystal runtime. We could check it by calculating the difference in `GC.stats.total_bytes` immediately before and after `Exception::CallStack.load_debug_info_impl` is called:

```crystal
struct Exception::CallStack
  def self.load_debug_info : Nil
    return if ENV["CRYSTAL_LOAD_DEBUG_INFO"]? == "0"

    unless @@dwarf_loaded
      @@dwarf_loaded = true
      begin
        old = GC.stats.total_bytes
        load_debug_info_impl
        STDERR.puts(GC.stats.total_bytes - old)
      rescue ex
        @@dwarf_line_numbers = nil
        @@dwarf_function_names = nil
        Crystal::System.print_exception "Unable to load dwarf information", ex
      end
    end
  end
end
```

Using `CRYSTAL_LOAD_DEBUG_INFO=1`, the values are:

* `x86_64-windows-gnu`, empty file: 7,677,008 -> 5,793,856 (-24.5%)
* `x86_64-linux-gnu`, empty file: 5,021,184 -> 3,607,840 (-28.1%)
* `aarch64-apple-dawrin`, empty file: 4,844,144 -> 3,453,024 (-28.7%)
* `x86_64-windows-gnu`, compiler: 190,008,704 -> 127,562,288 (-32.9%)
* `x86_64-linux-gnu`, compiler: 171,804,448 -> 110,186,544 (-35.9%)
* `aarch64-apple-dawrin`, compiler: 172,521,040 -> 110,276,752 (-36.1%)

We could also profile `GC.stats.heap_size` in the same way: (these values are less reliable because the GC might simply decide to skip a GC cycle on some runs)

* `x86_64-windows-gnu`, empty file: 3,858,432 -> 2,863,104 (-25.8%)
* `x86_64-linux-gnu`, empty file: 3,235,840 -> 2,285,568 (-29.4%)
* `aarch64-apple-dawrin`, empty file: 3,325,952 -> 2,392,064 (-28.1%)
* `x86_64-windows-gnu`, compiler: 76,259,328 -> 58,658,816 (-23.1%)
* `x86_64-linux-gnu`, compiler: 68,206,592 -> 44,556,288 (-34.7%)
* `aarch64-apple-dawrin`, compiler: 59,293,696 -> 38,764,544 (-34.6%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants