Prism::CodeUnitsCache #3173

kddnewton · 2024-10-09T18:44:18Z

Calculating code unit offsets for a source can be very expensive, especially when the source is large. This commit introduces a new class that wraps the source and desired encoding into a cache that reuses pre-computed offsets. It performs quite a bit better.

There are still some problems with this approach, namely character boundaries and the fact that the cache is unbounded, but both of these may be addressed in subsequent commits.

Some benchmarks, using the following script:

# # frozen_string_literal: true

require "bundler/setup"
require "prism"
require "benchmark"

code = "😀😀😀😀😀😀😀😀" * Integer(ARGV.first)
result = Prism.parse(code)

source = result.source
bytesize = code.bytesize

Benchmark.bm do |x|
  x.report("old") do
    1000.times { source.code_units_offset(rand(bytesize), Encoding::UTF_16LE) }
  end

  x.report("new") do
    cache = source.code_units_cache(Encoding::UTF_16LE)
    1000.times { cache[rand(bytesize)] }
  end
end

resulted in:

$ be ruby test.rb 10 
       user     system      total        real
old  0.002221   0.000374   0.002595 (  0.002597)
new  0.000789   0.000016   0.000805 (  0.000807)
$ be ruby test.rb 100
       user     system      total        real
old  0.008739   0.000677   0.009416 (  0.009421)
new  0.003202   0.000081   0.003283 (  0.003286)
$ be ruby test.rb 1000
       user     system      total        real
old  0.078277   0.003391   0.081668 (  0.081750)
new  0.016299   0.000608   0.016907 (  0.016929)
$ be ruby test.rb 10000
       user     system      total        real
old  0.749045   0.036684   0.785729 (  0.786660)
new  0.037629   0.003045   0.040674 (  0.040730)
$ be ruby test.rb 100000
       user     system      total        real
old  7.299773   0.319311   7.619084 (  7.624173)
new  0.521168   0.019792   0.540960 (  0.541081)

Calculating code unit offsets for a source can be very expensive, especially when the source is large. This commit introduces a new class that wraps the source and desired encoding into a cache that reuses pre-computed offsets. It performs quite a bit better. There are still some problems with this approach, namely character boundaries and the fact that the cache is unbounded, but both of these may be addressed in subsequent commits.

kddnewton force-pushed the code-units-cache branch 8 times, most recently from c59b804 to f2268a4 Compare October 9, 2024 19:40

vinistock mentioned this pull request Oct 9, 2024

Build locations before creating entries Shopify/ruby-lsp#2698

Merged

kddnewton force-pushed the code-units-cache branch from f2268a4 to 0056890 Compare October 10, 2024 14:05

kddnewton changed the title ~~Prism::Source::CodeUnitsCache~~ Prism::CodeUnitsCache Oct 10, 2024

kddnewton force-pushed the code-units-cache branch from 0056890 to 2e3e1a4 Compare October 10, 2024 15:01

vinistock mentioned this pull request Oct 10, 2024

Use code units cache API Shopify/ruby-lsp#2704

Merged

kddnewton merged commit ba89182 into main Oct 10, 2024
54 checks passed

kddnewton deleted the code-units-cache branch October 10, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prism::CodeUnitsCache #3173

Prism::CodeUnitsCache #3173

kddnewton commented Oct 9, 2024

Prism::CodeUnitsCache #3173

Prism::CodeUnitsCache #3173

Conversation

kddnewton commented Oct 9, 2024