Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could we request a new Ruby API to remove the need for TracePoint? #135

Closed
casperisfine opened this issue Aug 28, 2020 · 18 comments
Closed

Comments

@casperisfine
Copy link
Contributor

👋 @fxn, First, I know you like to keep the issues to 0, so I'm very sorry for opening such an open ended one. We can close it immediately if you wish 😉 .

Context

Zeitwerk need some hook to know when a new module or class is defined, so that it can define the required autoloads on it.

The problem I have with TracePoint is that it has a fairly significant cost, even when disabled. The need for a workable TracePoint means we can't compile iseqs with tracing disabled (RubyVM::InstructionSequence.compile_option = { trace_instruction: false } ), and that the VM will be constantly checking wether a tracepoint is enabled. Even if none are, it causes some overhead. So medium/long term I'd like to eliminate this need.

So I'm thinking, is there some better, more specific API we could request to ruby-core, so that hopefully in Ruby 2.8/3 or later, Zeitwerk could no longer depend on TracePoint ?

Ideas

const_added

Maybe Ruby could have a const_added callback similar to the method_added one? I think it could work, Zeitwerk would define one in Kernel/Object, however if a class/module was to redefine it without calling super, it would break autoloading. It might also be tricky to make it work with BasicObject.

However it seems relatively straight forward to implement.

nested autoload

Another nice API could be to allow nested const names in autoload, e.g. autoload "Foo::Bar", some_path. Unless I'm missing something, this would dramatically simplify Zeitwerk, you could just compute a list of all loadable constants, and directly register them all.

This might be slightly harder to implement, but seems ideal.

something else?

Maybe there are other APIs you had in mind while implementing Zeitwerk?

@fxn
Copy link
Owner

fxn commented Aug 29, 2020

Hey @casperisfine!

Before fully committing to work on the library, I wanted to validate the impact of the TracePoint technique. There were some benchmarks performed, and Discourse ran with it, and the result was that it did not seem to be noticeable. That was key to move on with the project.

I dislike usage of TracePoint because I do not find it elegant, so I am definitely in favor of a better API, but that is a different motivation and my idea was that if with time new APIs appeared, I would adopt them, but would not push for them myself.

However, your motivation is performance and would like to understand the cost with actual numbers. Could you share some please making sure the benchmark uses TracePoint like Zeitwerk does?

If motivation is performance, would we incur in the risk of asking for an API like const_added, only to realize it is as or more expensive than the current solution?

@byroot
Copy link
Contributor

byroot commented Aug 29, 2020

Sure. I'll try to get you some numbers next week.

please making sure the benchmark uses TracePoint like Zeitwerk does?

Actually, just compiling iseq with and without trace_instruction: false should show a difference, regardless of wether a TracePoint exist.

@byroot
Copy link
Contributor

byroot commented Aug 29, 2020

Hum, actually it might not be so obvious anymore, since 2.5 the trace instructions are gone and are replaced by tracing flags: https://bugs.ruby-lang.org/issues/14104 and I don't think they can be disabled. I'll still see with @XrXr next week what could be done.

@fxn
Copy link
Owner

fxn commented Aug 29, 2020

Awesome Jean.

Let's keep this open then by now 👍.

@casperisfine
Copy link
Contributor Author

So @XrXr did some benchmarking using opt-carrot: XrXr/optcarrot@d2b7ad1

alanwu ~/s/g/m/optcarrot (bench-disabled-tracepoint)> benchmark-driver -v benchmark.yml benchmark-with-disabled-tracepoint.yml --repeat-count=10 --output=all
2.7.1: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
Calculating -------------------------------------
                         Optcarrot Lan_Master.nes    44.04014503822608 fps
                                                     44.15227629549982
                                                     45.27481814595652
                                                     45.27891813608478
                                                     45.29578145316198
                                                     45.47429691677146
                                                     45.50786780428272
                                                     45.51201011392739
                                                     45.82951420771205
                                                     45.98429380912889
Optcarrot Lan_Master.nes with disabled TracePoint    42.31470859251375 fps
                                                     42.43821702928570
                                                     42.52685098152938
                                                     42.74661233075901
                                                     42.76753468944096
                                                     42.77098986392414
                                                     43.26257498874642
                                                     43.46860118051134
                                                     43.70862856887850
                                                     43.72965356407857

So according to this benchmark, enabling a TracePoint once, even if it is then immediately disabled has a 1-5% overhead.

@fxn
Copy link
Owner

fxn commented Sep 1, 2020

@XrXr To make sure we compare apples to apples, would you mind repeating the benchmark for a TracePoint that listens to the :class event? That is what Zeitwerk listens to.

Let's disable using the same interface as well for the same price (guess it's equivalent, but still):

TracePoint.new(:class){}.disable

(For reference, TracePoint usage is all within this file.)

@casperisfine
Copy link
Contributor Author

would you mind repeating the benchmark for a TracePoint that listens to the :class event?

Seem like you had some flair here. I did repeat Alan's benchmark difference with :call, but then switch to :class, and the performance difference is then pretty much imperceptible:

2.7.1: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin19]
Calculating -------------------------------------
                         Optcarrot Lan_Master.nes   47.852232310451093 fps
                                                    47.994624605811694 
                                                    48.269537094602079 
                                                    48.476230487755984 
                                                    48.502355059475306 
                                                    48.752214169987816 
                                                    48.875324482176481 
                                                    48.943367082551028 
                                                    48.983323905582090 
                                                    49.518569466593746 
Optcarrot Lan_Master.nes with disabled TracePoint    47.03939287974622 fps
                                                     47.65636583345366 
                                                     48.21084208675389 
                                                     48.49399213104561 
                                                     48.54316565329117 
                                                     48.55180747237607 
                                                     48.78789193110493 
                                                     49.15937468444442 
                                                     49.79611257880454 
                                                     49.89660313928021 

So without looking at the code, it seems that that tracepoint impact is limited to the events that were listened to. So we can assume that the slowdown is limited to class/module creations (things opt-carot don't do), which should be fairly rare in production past boot time.

@fxn
Copy link
Owner

fxn commented Sep 7, 2020

🙌

@casperisfine
Copy link
Contributor Author

So apparently TracePoint is causing issues to MJIT: https://k0kubun.medium.com/ruby-3-jit-can-make-rails-faster-756310f235a, but doesn't mean Zeitwerk needs to change, apparently k0kubun might look into improved support for TracePoint.

Also if I understand https://bugs.ruby-lang.org/issues/14104 correctly, the cost now mist mostly when calling TracePoint#enable/disable. So maybe we could do a small refactor to just ensure we only disable it once when eagerloading.

@fxn
Copy link
Owner

fxn commented May 21, 2021

Yeah, we talked about this today on Twitter (don't know if you can see the conversation without being logged in).

@casperisfine
Copy link
Contributor Author

Ah nice. Yes, it's public.

@fxn
Copy link
Owner

fxn commented May 21, 2021

By now, I'd lean on waiting a bit and see, because the JIT is still a work in progress and @k0kubun is going to study options.

I also offered to adjust Zeitwerk as needed if that could make his work easier, like @tracer = nil instead of disable or whatever he might suggest.

@byroot
Copy link
Contributor

byroot commented May 21, 2021

Yeah, I also asked Alan, and according to him YJIT doesn't really have this problem.

Another argument for pushing for another API would be that the usage of TracePoint breaks byebug, but Koichi is working on a new official debugger and I tested it, it works fine with Zeitwerk: https://github.com/ruby/debug

But even with all this, I still feel uneasy with using TracePoint. If I get some time to hack on Ruby I might try my hand at the "nested autoload" solution, as it seems the most robust.

@fxn
Copy link
Owner

fxn commented May 21, 2021

But even with all this, I still feel uneasy with using TracePoint. If I get some time to hack on Ruby I might try my hand at the "nested autoload" solution, as it seems the most robust.

If you do that I go to France and invite you to dinner.

I've always seen TracePoint usage in Zeitwerk as a necessary evil I'd love to avoid. I ensured even before writing code that performance was not affected, but from an implementation standpoint, it feels a bit off to me. Only, I do not know any other technique today to support explicit namespaces.

Support for nested autoloads would be super. Off the top of my head, I think we could still be lazy visiting the tree by using that feature only in the case of explicit namespaces.

A callback on class/module definition could also make it, if invoked before the class body is executed.

@casperisfine
Copy link
Contributor Author

Support for nested autoloads would be super. / A callback on class/module definition could also make it

Do you have a preference for either? I might just go with the simplest, but still good to know your opinion.

@fxn
Copy link
Owner

fxn commented May 21, 2021

Do you have a preference for either? I might just go with the simplest, but still good to know your opinion.

Intuitively, the callback.

It would be simpler to implement, it is all Zeitwerk needs, there already exist callbacks for other things in Ruby, and Zeitwerk would preserve its controlled/uniform descend.

@casperisfine
Copy link
Contributor Author

So turns out the implementation is quite trivial: ruby/ruby#4521

@casperisfine
Copy link
Contributor Author

Also feel free to chime in https://bugs.ruby-lang.org/issues/17881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants