Skip to content

Make KeyValidator faster by sorting keys and using binary search algorithm (Array#bsearch) #453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 14, 2023

Conversation

radarek
Copy link
Contributor

@radarek radarek commented Mar 13, 2023

I noticed that validating schema is very slow if it contains thousands of keys. Current algorithm is O(N^2), which means that doubling number of keys makes it ~four times slower. I was able to optimize it by sorting keys once and then using Array#bsearch method which is O(N*logN).

Benchmark

Here is benchmark I run before and after the change:

require 'dry/schema'
require 'benchmark'

def sized_benchmark(size, reporter)
  hash = size.times.to_h { [:"key_#{_1}", true] }
  nested = { foo: { bar: { baz: hash } } }

  schema = Dry::Schema.define do
    config.validate_keys = true

    required(:foo).hash do
      required(:bar).hash do
        required(:baz).hash do
          hash.each do |key, _|
            required(key)
          end
        end
      end
    end
  end

  reporter.report("schema call with size = #{size}") do
    schema.call(nested)
  end
end

Benchmark.bmbm do |x|
  sized_benchmark(1, x)
  sized_benchmark(10, x)
  sized_benchmark(100, x)
  sized_benchmark(1000, x)
  sized_benchmark(10000, x)
end

Before:

$ ruby -Ilib/ bm.rb
Rehearsal -----------------------------------------------------------------
schema call with size = 1       0.001126   0.000397   0.001523 (  0.002303)
schema call with size = 10      0.000085   0.000001   0.000086 (  0.000086)
schema call with size = 100     0.001486   0.000013   0.001499 (  0.001502)
schema call with size = 1000    0.145060   0.002052   0.147112 (  0.147127)
schema call with size = 10000  12.251927   0.109480  12.361407 ( 12.384599)
------------------------------------------------------- total: 12.511627sec

                                    user     system      total        real
schema call with size = 1       0.000101   0.000000   0.000101 (  0.000099)
schema call with size = 10      0.000117   0.000001   0.000118 (  0.000117)
schema call with size = 100     0.001314   0.000019   0.001333 (  0.001332)
schema call with size = 1000    0.122066   0.000790   0.122856 (  0.122957)
schema call with size = 10000  12.230255   0.094633  12.324888 ( 12.347312)

After:

$ ruby -Ilib/ bm.rb
Rehearsal -----------------------------------------------------------------
schema call with size = 1       0.001233   0.000475   0.001708 (  0.002473)
schema call with size = 10      0.000080   0.000000   0.000080 (  0.000081)
schema call with size = 100     0.000413   0.000001   0.000414 (  0.000414)
schema call with size = 1000    0.003748   0.000042   0.003790 (  0.003794)
schema call with size = 10000   0.052483   0.000983   0.053466 (  0.054539)
-------------------------------------------------------- total: 0.059458sec

                                    user     system      total        real
schema call with size = 1       0.000099   0.000000   0.000099 (  0.000097)
schema call with size = 10      0.000116   0.000001   0.000117 (  0.000115)
schema call with size = 100     0.000350   0.000000   0.000350 (  0.000350)
schema call with size = 1000    0.002802   0.000096   0.002898 (  0.002894)
schema call with size = 10000   0.032131   0.000347   0.032478 (  0.032612)

@radarek radarek marked this pull request as ready for review March 13, 2023 22:34
@solnic solnic merged commit 44008b4 into dry-rb:main Mar 14, 2023
@solnic
Copy link
Member

solnic commented Mar 14, 2023

Thank you! This is great ❤️ 🚀

@radarek
Copy link
Contributor Author

radarek commented Mar 14, 2023

@solnic Can you please revert it? This implementation is not correct. I need to rethink some edge cases.

Here is an example of failing specs with this PR but working on main:

it "copies key map from the parent and includes new keys from child" do
  schema = Dry::Schema.Params do
    config.validate_keys = true

    required(:a).filled(:string)
    required(:fooA).filled(:string)
    required(:foo).array(:hash) do
      required(:bar).filled(:string)
    end
  end

  expect(schema.(a: "string", fooA: "string", foo: "string").errors.to_h)
    .to eql({foo: ["must be an array"]})
end

In this PR it produces :foo => ["is not allowed"].

@solnic
Copy link
Member

solnic commented Mar 14, 2023

@radarek no worries. Could we have another PR that fixes it instead of reverting?

@radarek
Copy link
Contributor Author

radarek commented Mar 15, 2023

@solnic If it is not a problem that currently main is broken then I will create new PR. I need some time to make more tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants