Optimize the TracePoint callback #24

casperisfine · 2019-03-18T15:36:17Z

Context

While experimenting with Zeitwerk on an existing Rails app, one test became extremely slow (I'm not even sure how long it takes as it never completed, but I estimate it at over 15 minutes, from ~3s previously).

So I decided to profile that test (or rather a subset of what it's doing) and got this:

$ stackprof tmp/stackprof-stripe-wall-.dump 
==================================
  Mode: wall(1000)
  Samples: 6 (83.78% miss rate)
  GC: 0 (0.00%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
         5  (83.3%)           5  (83.3%)     block in <module:ExplicitNamespace>
         1  (16.7%)           1  (16.7%)     Set#each
         5  (83.3%)           0   (0.0%)     Stripe::StripeObject#metaclass
         1  (16.7%)           0   (0.0%)     Stripe::StripeObject#remove_accessors
         6 (100.0%)           0   (0.0%)     Stripe::StripeObject#initialize_from
         6 (100.0%)           0   (0.0%)     Stripe::StripeObject.construct_from
         6 (100.0%)           0   (0.0%)     block (2 levels) in <class:StripeParserTest>
         6 (100.0%)           0   (0.0%)     block in <class:StripeParserTest>

So basically what is happening is that the Stripe gem (but could be any other dependency really, I'm not particularly pointing fingers here) is defining methods on instances, which lazily creates singleton classes.

These singleton classes are spamming the TracePoint callback so much that everything slows down to a crawl.

The patch

I tried making the callback bail out as soon as possible when it's clear the class is not relevant to Zeitwerk (i.e. is a singleton_class).

Unfortunately it's still way too slow, so I'll have to figure something else, but I suppose an optimization is always good to take.

@fxn what do you think? Also if you have other ideas how to solve this issue I'm all ears.

cc @rafaelfranca @Edouard-chin

casperisfine · 2019-03-18T15:47:10Z

Actually, I tested with next if event.class.singleton_class? and somehow it's much faster than next unless event.class.name.

It's now fast enough that the test complete more or less as fast as before Zeitwerk.

fxn · 2019-03-18T15:52:30Z

Absolutely, computing the hash code is expensive if called too much, so better short-circuit whatever we can, specially if the test is cheap.

What do you think about testing event.self.name.nil?? That would cover singleton classes as well as anonymous classes and modules.

Why does the patch rewrite event.self.name as name?

casperisfine · 2019-03-18T15:53:49Z

Why does the patch rewrites event.self.name as name?

Oops. That a leftover from the previous patch version I'll fix that in a sec and answer your questions

fxn · 2019-03-18T15:55:31Z

Good!

break unless event.self.name

is equivalent, and avoids the extra nil? call, which in this case is not needed because we have either a string or nil.

casperisfine · 2019-03-18T15:56:34Z

What do you think about testing event.self.name.nil?? That would cover singleton classes as well as anonymous classes and modules.

So that is what I was doing in the first patch version. However it didn't make much difference performance wise. Turns out Module#name is about 1000X slower than Class#singleton_class?.

I'm currently digging in MRI to see why exactly, however I have a few intuitions as why.

That being said we can probably combine both patches. First discard singleton_class? as it's very cheap, and then as a second measure discard name.nil? since we'll be accessing the name anyway.

fxn · 2019-03-18T15:57:33Z

Sounds good!

casperisfine · 2019-03-18T16:02:54Z

Ok, so here's why Module#name is slow:

First it instantiate a new sting every time (already knew that): https://github.com/ruby/ruby/blob/28f619e301cf5f266f38fe764c4018b7fabc77cb/variable.c#L242-L250 Not the end of the world, but it means we definitely want to hold onto that name otherwise we'll generate lots of useless allocations.

Then classname() is quite a big process: https://github.com/ruby/ruby/blob/28f619e301cf5f266f38fe764c4018b7fabc77cb/variable.c#L185-L233

Basically the name of a Module, is not one of it's properties, every time you ask for it, MRI is going to lookup the module inside the constants table cache.

And if not found it's going to find it the hard way: https://github.com/ruby/ruby/blob/28f619e301cf5f266f38fe764c4018b7fabc77cb/variable.c#L151-L183

That probably explain why it's extra expensive for nameless singleton_class.

casperisfine · 2019-03-18T16:06:00Z

@fxn I updated the patch again.

fxn · 2019-03-18T16:18:19Z

Just a couple of minor details:

MRI allocate

would be

MRI allocates

On the other hand, I think I'd prefer break instead of next to short-circuit. While both may result in the same behavior, I find break to be more intention-revealing here.

casperisfine · 2019-03-18T16:19:36Z

@fxn fixed both!

fxn · 2019-03-18T16:31:47Z

🙌

casperisfine · 2019-03-18T16:34:10Z

Thanks for the fast merge ❤️

casperisfine · 2019-03-18T16:44:36Z

Wait. I'm getting some:




LocalJumpError: break from proc-closure
  | /tmp/bundle/ruby/2.5.0/bundler/gems/zeitwerk-a1c68e3c8aa0/lib/zeitwerk/explicit_namespace.rb:62:in `block in <module:ExplicitNamespace>'

now, not sure what's going on

casperisfine · 2019-03-18T16:46:00Z

Nevermind, I bet the test suite just never encounter these cases. I'll submit a second PR with two new test cases.

fxn · 2019-03-19T00:03:33Z

I have just published 1.4.0. It's late here, I'll adapt the Rails integration soon.

Optimize the TracePoint callback

4f42383

fxn merged commit a1c68e3 into fxn:master Mar 18, 2019

casperisfine mentioned this pull request Mar 18, 2019

Add a test case for singleton classes in TracePoint, and fix next vs break #25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the TracePoint callback #24

Optimize the TracePoint callback #24

casperisfine commented Mar 18, 2019 •

edited

Loading

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019 •

edited

Loading

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 19, 2019

Optimize the TracePoint callback #24

Optimize the TracePoint callback #24

Conversation

casperisfine commented Mar 18, 2019 • edited Loading

Context

The patch

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019 • edited Loading

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

casperisfine commented Mar 18, 2019

fxn commented Mar 19, 2019

casperisfine commented Mar 18, 2019 •

edited

Loading

fxn commented Mar 18, 2019 •

edited

Loading