Skip to content

Optimize base16 encoding#7274

Merged
ThatSpaceGuy merged 7 commits intomattw/just-jwesfrom
margolis-optimize-base16-encode
Nov 2, 2022
Merged

Optimize base16 encoding#7274
ThatSpaceGuy merged 7 commits intomattw/just-jwesfrom
margolis-optimize-base16-encode

Conversation

@zachmargolis
Copy link
Contributor

  • << turns out to be faster than [].join

See "encode_3" for some improvements over existing. Normally I would not try to pre-optimize this, but we I know that we were planning to do this for multiple megabytes of data at a time

Here's what I did to benchmark this against #7204

require 'benchmark/ips'
require 'base16'
require 'securerandom'

class PrBase16
  def self.encode16(str)
    str.bytes.map { |char| char.to_s(16).upcase.rjust(2, "0") }.join
  end

  def self.decode16(str)
    output = ''
    str.chars.each_slice(2) do |chars|
      output << chars.join.to_i(16).chr
    end
    output
  end
end

class Base16V3
  def self.encode16(str)
    output = ''
    str.bytes.each { |char| output << char.to_s(16).upcase.rjust(2, "0") }
    output
  end

  def self.decode16(str)
    str.chars.each_slice(2).map do |pair|
      pair.join.to_i(16).chr
    end.join
  end
end

random_10k_bytes = SecureRandom.random_bytes(10_000)
random_10k_hex = SecureRandom.hex(10_000)

Benchmark.ips do|x|
  x.report('encode_gem') do
    Base16.encode16(random_10k_bytes)
  end

  x.report('encode_pr') do
    PrBase16.encode16(random_10k_bytes)
  end

  x.report('encode_3') do
    Base16V3.encode16(random_10k_bytes)
  end

  x.compare!
end

Benchmark.ips do|x|
  x.report('decode_gem') do
    Base16.decode16(random_10k_hex)
  end

  x.report('decode_pr') do
    PrBase16.decode16(random_10k_hex)
  end

  x.report('decode_3') do
    Base16V3.decode16(random_10k_hex)
  end

  x.compare!
end

- << turns out to be faster than [].join

# The IRS has requested data be encoded this way. Loosely emulate the Base64 class.

def self.encode16(str)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will matter more in context as well, but another option is for us to shell out to xxd. I think once we start experimenting with big encrypting, we might just write the raw data to a file and shell out to xxd | gzip or something and I think that will be faster than doing things in ruby

@zachmargolis
Copy link
Contributor Author

zachmargolis commented Nov 2, 2022

Updated benchmark, 10k bytes vs 100k bytes, xxd starts to win. I think for prototyping, early phases we should continue in Ruby but once our background jobs and stuff start to solidify, I 100% think we should start shelling out

Warming up --------------------------------------
      encode_gem_10k     3.000  i/100ms
       encode_pr_10k    22.000  i/100ms
        encode_3_10k    25.000  i/100ms
        xxd_file_10k    18.000  i/100ms
       xxd_stdin_10k    20.000  i/100ms
Calculating -------------------------------------
      encode_gem_10k     39.006  (±23.1%) i/s -    183.000  in   5.031732s
       encode_pr_10k    170.964  (±25.7%) i/s -    814.000  in   5.176560s
        encode_3_10k    200.373  (±16.0%) i/s -    975.000  in   5.081257s
        xxd_file_10k    151.883  (± 4.6%) i/s -    756.000  in   4.989180s
       xxd_stdin_10k    175.054  (± 6.9%) i/s -    880.000  in   5.051354s

Comparison:
        encode_3_10k:      200.4 i/s
       xxd_stdin_10k:      175.1 i/s - same-ish: difference falls within error
       encode_pr_10k:      171.0 i/s - same-ish: difference falls within error
        xxd_file_10k:      151.9 i/s - 1.32x  (± 0.00) slower
      encode_gem_10k:       39.0 i/s - 5.14x  (± 0.00) slower

Warming up --------------------------------------
      xxd_stdin_100k     5.000  i/100ms
     encode_gem_100k     1.000  i/100ms
      encode_pr_100k     1.000  i/100ms
       encode_3_100k     2.000  i/100ms
       xxd_file_100k     5.000  i/100ms
      xxd_stdin_100k     6.000  i/100ms
Calculating -------------------------------------
      xxd_stdin_100k     58.977  (± 3.4%) i/s -    295.000  in   5.005833s
     encode_gem_100k      0.180  (± 0.0%) i/s -      1.000  in   5.561398s
      encode_pr_100k     14.688  (± 0.0%) i/s -     74.000  in   5.043406s
       encode_3_100k     25.216  (± 0.0%) i/s -    128.000  in   5.077671s
       xxd_file_100k     55.823  (± 1.8%) i/s -    280.000  in   5.016903s
      xxd_stdin_100k     59.306  (± 1.7%) i/s -    300.000  in   5.059981s

Comparison:
      xxd_stdin_100k:       59.3 i/s
       xxd_file_100k:       55.8 i/s - 1.06x  (± 0.00) slower
       encode_3_100k:       25.2 i/s - 2.35x  (± 0.00) slower
      encode_pr_100k:       14.7 i/s - 4.04x  (± 0.00) slower
     encode_gem_100k:        0.2 i/s - 329.82x  (± 0.00) slower

Warming up --------------------------------------
          decode_gem     5.000  i/100ms
           decode_pr    15.000  i/100ms
            decode_3    13.000  i/100ms
Calculating -------------------------------------
          decode_gem     56.166  (± 5.3%) i/s -    280.000  in   5.003975s
           decode_pr    151.081  (± 2.6%) i/s -    765.000  in   5.066810s
            decode_3    122.092  (±14.7%) i/s -    585.000  in   5.030036s

Comparison:
           decode_pr:      151.1 i/s
            decode_3:      122.1 i/s - 1.24x  (± 0.00) slower
          decode_gem:       56.2 i/s - 2.69x  (± 0.00) slower
require 'benchmark/ips'
require 'base16'
require 'securerandom'
require 'open3'
require 'tempfile'

class PrBase16
  def self.encode16(str)
    str.bytes.map { |char| char.to_s(16).upcase.rjust(2, "0") }.join
  end

  def self.decode16(str)
    output = ''
    str.chars.each_slice(2) do |chars|
      output << chars.join.to_i(16).chr
    end
    output
  end
end

class Base16V3
  def self.encode16(str)
    output = ''
    str.bytes.each { |char| output << char.to_s(16).upcase.rjust(2, "0") }
    output
  end

  def self.decode16(str)
    str.chars.each_slice(2).map do |pair|
      pair.join.to_i(16).chr
    end.join
  end
end

random_10k_bytes = SecureRandom.random_bytes(10_000)
random_10k_hex = SecureRandom.hex(10_000)
random_10k_bytes_file = Tempfile.new
File.open(random_10k_bytes_file.path, 'wb') { |f| f.write(random_10k_bytes) }
random_10k_hex_file = Tempfile.new
File.open(random_10k_hex_file.path, 'w') { |f| f.write(random_10k_hex) }

random_100k_bytes = SecureRandom.random_bytes(100_000)
random_100k_hex = SecureRandom.hex(100_000)
random_100k_bytes_file = Tempfile.new
File.open(random_100k_bytes_file.path, 'wb') { |f| f.write(random_100k_bytes) }
random_100k_hex_file = Tempfile.new
File.open(random_100k_hex_file.path, 'w') { |f| f.write(random_100k_hex) }

outfile = Tempfile.new

Benchmark.ips do|x|
  x.report('encode_gem_10k') do
    Base16.encode16(random_10k_bytes)
  end

  x.report('encode_pr_10k') do
    PrBase16.encode16(random_10k_bytes)
  end

  x.report('encode_3_10k') do
    Base16V3.encode16(random_10k_bytes)
  end

  x.report('xxd_file_10k') do
    system('xxd', '-u', '-plain', random_10k_bytes_file.path, outfile.path)
  end

  x.report('xxd_stdin_10k') do
    Open3.popen3('xxd', '-u', '-plain') do |stdin, stdout|
      stdin.write(random_10k_bytes)
      stdin.close
      stdout.read
    end
  end

  x.compare!
end

Benchmark.ips do |x|
  x.report('xxd_stdin_100k') do
    Open3.popen3('xxd', '-u', '-plain') do |stdin, stdout|
      stdin.write(random_100k_bytes)
      stdin.close
      stdout.read
    end
  end

  x.report('encode_gem_100k') do
    Base16.encode16(random_100k_bytes)
  end

  x.report('encode_pr_100k') do
    PrBase16.encode16(random_100k_bytes)
  end

  x.report('encode_3_100k') do
    Base16V3.encode16(random_100k_bytes)
  end

  x.report('xxd_file_100k') do
    system('xxd', '-u', '-plain', random_100k_bytes_file.path, outfile.path)
  end

  x.report('xxd_stdin_100k') do
    Open3.popen3('xxd', '-u', '-plain') do |stdin, stdout|
      stdin.write(random_100k_bytes)
      stdin.close
      stdout.read
    end
  end

  x.compare!
end

Benchmark.ips do|x|
  x.report('decode_gem') do
    Base16.decode16(random_10k_hex)
  end

  x.report('decode_pr') do
    PrBase16.decode16(random_10k_hex)
  end

  x.report('decode_3') do
    Base16V3.decode16(random_10k_hex)
  end

  x.compare!
end


random_10k_bytes_file.unlink
random_10k_hex_file.unlink

@zachmargolis
Copy link
Contributor Author

💡 that pack and unpack literally do all this for us, and since they're implemented in C instead of ruby, they're leagues faster.

ruby base16.rb 
Warming up --------------------------------------
      encode_gem_10k     3.000  i/100ms
       encode_pr_10k    21.000  i/100ms
        encode_3_10k    23.000  i/100ms
     encode_pack_10k     5.975k i/100ms
        xxd_file_10k    19.000  i/100ms
       xxd_stdin_10k    21.000  i/100ms
Calculating -------------------------------------
      encode_gem_10k     50.141  (±25.9%) i/s -    237.000  in   5.047424s
       encode_pr_10k    235.691  (±12.7%) i/s -      1.155k in   5.079293s
        encode_3_10k    226.123  (±10.2%) i/s -      1.127k in   5.048677s
     encode_pack_10k     62.379k (±11.0%) i/s -    310.700k in   5.058291s
        xxd_file_10k    183.512  (± 4.4%) i/s -    931.000  in   5.084147s
       xxd_stdin_10k    209.037  (± 5.7%) i/s -      1.050k in   5.041609s

Comparison:
     encode_pack_10k:    62379.2 i/s
       encode_pr_10k:      235.7 i/s - 264.67x  (± 0.00) slower
        encode_3_10k:      226.1 i/s - 275.86x  (± 0.00) slower
       xxd_stdin_10k:      209.0 i/s - 298.41x  (± 0.00) slower
        xxd_file_10k:      183.5 i/s - 339.92x  (± 0.00) slower
      encode_gem_10k:       50.1 i/s - 1244.08x  (± 0.00) slower

Warming up --------------------------------------
          decode_gem     5.000  i/100ms
           decode_pr    15.000  i/100ms
            decode_3    12.000  i/100ms
         decode_pack   493.000  i/100ms
Calculating -------------------------------------
          decode_gem     55.712  (± 7.2%) i/s -    280.000  in   5.052216s
           decode_pr    160.648  (± 3.1%) i/s -    810.000  in   5.047056s
            decode_3    145.287  (±10.3%) i/s -    720.000  in   5.040730s
         decode_pack      4.576k (± 9.4%) i/s -     22.678k in   5.009005s

Comparison:
         decode_pack:     4575.8 i/s
           decode_pr:      160.6 i/s - 28.48x  (± 0.00) slower
            decode_3:      145.3 i/s - 31.49x  (± 0.00) slower
          decode_gem:       55.7 i/s - 82.13x  (± 0.00) slower

@zachmargolis zachmargolis requested a review from a team November 2, 2022 16:25
@zachmargolis
Copy link
Contributor Author

ok one last time, updated because we needed an upcase in there, slightly slower but still faster than existing

Warming up --------------------------------------
      encode_gem_10k     1.000  i/100ms
       encode_pr_10k    10.000  i/100ms
        encode_3_10k    10.000  i/100ms
     encode_pack_10k   801.000  i/100ms
        xxd_file_10k    12.000  i/100ms
       xxd_stdin_10k    12.000  i/100ms
Calculating -------------------------------------
      encode_gem_10k     38.267  (±23.5%) i/s -    179.000  in   4.999714s
       encode_pr_10k    167.452  (±21.5%) i/s -    790.000  in   5.078897s
        encode_3_10k    177.296  (±24.3%) i/s -    810.000  in   5.040526s
     encode_pack_10k      8.288k (±17.6%) i/s -     40.050k in   5.039900s
        xxd_file_10k    135.442  (±14.8%) i/s -    660.000  in   5.023229s
       xxd_stdin_10k    126.034  (±27.0%) i/s -    552.000  in   5.057803s

Comparison:
     encode_pack_10k:     8288.4 i/s
        encode_3_10k:      177.3 i/s - 46.75x  (± 0.00) slower
       encode_pr_10k:      167.5 i/s - 49.50x  (± 0.00) slower
        xxd_file_10k:      135.4 i/s - 61.19x  (± 0.00) slower
       xxd_stdin_10k:      126.0 i/s - 65.76x  (± 0.00) slower
      encode_gem_10k:       38.3 i/s - 216.60x  (± 0.00) slower

Warming up --------------------------------------
          decode_gem     4.000  i/100ms
           decode_pr    12.000  i/100ms
            decode_3    11.000  i/100ms
         decode_pack   331.000  i/100ms
Calculating -------------------------------------
          decode_gem     46.118  (±17.3%) i/s -    216.000  in   5.034708s
           decode_pr    127.556  (± 9.4%) i/s -    636.000  in   5.040090s
            decode_3    101.360  (±11.8%) i/s -    506.000  in   5.086260s
         decode_pack      3.665k (±20.2%) i/s -     17.212k in   5.031577s

Comparison:
         decode_pack:     3664.6 i/s
           decode_pr:      127.6 i/s - 28.73x  (± 0.00) slower
            decode_3:      101.4 i/s - 36.15x  (± 0.00) slower
          decode_gem:       46.1 i/s - 79.46x  (± 0.00) slower

@ThatSpaceGuy ThatSpaceGuy merged this pull request into mattw/just-jwes Nov 2, 2022
@ThatSpaceGuy ThatSpaceGuy deleted the margolis-optimize-base16-encode branch November 2, 2022 21:42
ThatSpaceGuy pushed a commit that referenced this pull request Nov 8, 2022
* Optimize base16 encoding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants