-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent worker support #421
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Related to #412 |
That sounds fine, cargo-raze is a much closer analog. We're not currently fetching raze as part of rules_rust, but we probably should. |
6c0cd08
to
5942456
Compare
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
1c53c80
to
caf1b82
Compare
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
1 similar comment
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@nikhilm unfortunately I'm fairly new to Bazel and thus do not have anything to add on how the problems you listed may be resolved, but I just wanted to say thank you for working on this - waiting ~5 secs for my crate to compile in the trivial change case instead of ~15 secs is a lot nicer. In case it's useful, here is your branch rebased over the latest master: https://github.com/ankitects/rules_rust/tree/persistentworker Edit: a commit that adds macOS support is available here: https://github.com/ankitects/rules_rust/tree/macos-worker |
I have a crate that uses pyo3 to create a Python extension module. On macOS, special link flags are required: rustc_flags = selects.with_or({ ( "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin", ): [ "-Clink-arg=-undefined", "-Clink-arg=dynamic_lookup", ], "//conditions:default": [], }), Without them, the linker on macOS fails, as the Python API is not available until runtime. rules_rust's clippy implementation was passing --emit=dep-info,link to the clippy invocation, causing a clippy run to fail with linking errors. This patch changes the invocation to use --emit=dep-info,metadata instead, which is what cargo uses. It also shaves a bit of time off the check, as linking no longer needs to happen. Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect seems to be falling back on a non-worker compile, so it's still noticeably slower than running cargo clippy directly when minor changes have been made.
I have a crate that uses pyo3 to create a Python extension module. On macOS, special link flags are required: rustc_flags = selects.with_or({ ( "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin", ): [ "-Clink-arg=-undefined", "-Clink-arg=dynamic_lookup", ], "//conditions:default": [], }), Without them, the linker on macOS fails, as the Python API is not available until runtime. rules_rust's clippy implementation was passing --emit=dep-info,link to the clippy invocation, causing a clippy run to fail with linking errors. This patch changes the invocation to use --emit=dep-info,metadata instead, which is what cargo uses. It also shaves a bit of time off the check, as linking no longer needs to happen. Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect seems to be falling back on a non-worker compile, so it's still noticeably slower than running cargo clippy directly when minor changes have been made.
I have a crate that uses pyo3 to create a Python extension module. On macOS, special link flags are required: rustc_flags = selects.with_or({ ( "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin", ): [ "-Clink-arg=-undefined", "-Clink-arg=dynamic_lookup", ], "//conditions:default": [], }), Without them, the linker on macOS fails, as the Python API is not available until runtime. rules_rust's clippy implementation was passing --emit=dep-info,link to the clippy invocation, causing a clippy run to fail with linking errors. This patch changes the invocation to use --emit=dep-info,metadata instead, which is what cargo uses. It also shaves a bit of time off the check, as linking no longer needs to happen. Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect seems to be falling back on a non-worker compile, so it's still noticeably slower than running cargo clippy directly when minor changes have been made.
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
2 similar comments
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
…non-persistent case
I have a crate that uses pyo3 to create a Python extension module. On macOS, special link flags are required: rustc_flags = selects.with_or({ ( "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin", ): [ "-Clink-arg=-undefined", "-Clink-arg=dynamic_lookup", ], "//conditions:default": [], }), Without them, the linker on macOS fails, as the Python API is not available until runtime. rules_rust's clippy implementation was passing --emit=dep-info,link to the clippy invocation, causing a clippy run to fail with linking errors. This patch changes the invocation to use --emit=dep-info,metadata instead, which is what cargo uses. It also shaves a bit of time off the check, as linking no longer needs to happen. Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect seems to be falling back on a non-worker compile, so it's still noticeably slower than running cargo clippy directly when minor changes have been made.
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@mfarrugi when you have some time could you merge this? Thanks! |
|
||
Iterating on Rust code may benefit from [incremental compilation](https://doc.rust-lang.org/edition-guide/rust-2018/the-compiler/incremental-compilation-for-faster-compiles.html). This is supported by using a [Bazel Persistent Worker](https://docs.bazel.build/versions/master/persistent-workers.html). While Bazel can determine what needs to be rebuilt at a crate level, the compiler can speed up building a single crate by sharing information across runs. It does this by storing intermediate information in a directory across invocations. This is enabled by default in Cargo. Persistent workers bring this feature to Bazel. | ||
|
||
The Rust rules have preliminary support for workers. Pass `use_worker = True` to enable this when available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is linux only, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a Mac version above - but each platform would need its own binary at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rustc-worker also has a mac build available now. I can update this PR with an entry for that.
The Rust rules have preliminary support for workers. Pass `use_worker = True` to enable this when available. | ||
|
||
```python | ||
rust_repositories(use_worker = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this shouldn't be the default, was there a reason to be conservative about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pulling in and running a binary hosted on a third-party repo, so off by default is probably the best option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what @dae said. Plus given that only 2(?) people have actually used it, it seemed very alpha.
1. Since the worker has dependencies on various crates, it uses cargo-raze, which generates relevant rules. This means "don't use the worker to build this target" is transitive and such information needs to be propagated down the tree in a way that works with restrictions in Bazel. Initial experiments repeatedly encountered failures due to Bazel treating the rule dependency on the worker executable target as a cycle, even when building in non-worker mode. This may be user error or a restriction in Bazel. Until that is addressed, the easiest way to fix this is to change cargo-raze to customize what rules are used, and provide `rust_no_worker_binary` and `rust_no_worker_library` rules that do not have this cycle. | ||
2. Figuring out a good strategy for dependencies. Since Bazel doesn't really have transitive dependencies, attempts to build this worker from source necessarily require users of these rules to register all the external repositories for the worker in their `WORKSPACE`. This could cause collisions with other dependencies. In addition, if the `_no_worker_` approach above is adopted, users will lose the benefits of workers for those dependencies (like `protobuf`) shared between the worker and their code. There is no satisfactory solution for this right now. | ||
|
||
## How about rewriting the worker in C++? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the worker benefit from being written in a non-gc'd language? Curious about whether python is a reasonable choice, given other utilities in rules_rust have been written in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is nothing preventing writing this in Python for someone with such an interest. I did not find any references to Python rules in this repo. I suppose as a bootstrap language (i.e. one that already has Bazel rules for it), that is a better alternative than C++.
I was really hoping for a second set of eyes to at least take a high-level look. In particular I'm not a huge fan of depending on a linux-only binary download, and while ultimately I think that is fine I can imagine other opinions. Does anyone care to opine on default-ness or any of the other high level aspects of this addition? cc @dfreese @illicitonion @UebelAndre |
Avoiding the need for a binary would be ideal. Python could work, though I don't see any Python in rules_rust at the moment? Alternatively perhaps the cyclic dependency issue could be solved by having a separate rust_binary_noworker() rule that does not take the worker toolchain as an argument. Protobuf dependencies could be avoided by using JSON: https://blog.bazel.build/2020/11/11/json-workers.html |
I have a crate that uses pyo3 to create a Python extension module. On macOS, special link flags are required: rustc_flags = selects.with_or({ ( "@io_bazel_rules_rust//rust/platform:x86_64-apple-darwin", ): [ "-Clink-arg=-undefined", "-Clink-arg=dynamic_lookup", ], "//conditions:default": [], }), Without them, the linker on macOS fails, as the Python API is not available until runtime. rules_rust's clippy implementation was passing --emit=dep-info,link to the clippy invocation, causing a clippy run to fail with linking errors. This patch changes the invocation to use --emit=dep-info,metadata instead, which is what cargo uses. It also shaves a bit of time off the check, as linking no longer needs to happen. Tangentially related to bazelbuild#428 and bazelbuild#421 - currently the clippy aspect seems to be falling back on a non-worker compile, so it's still noticeably slower than running cargo clippy directly when minor changes have been made.
This did not work out due to the requirement that pops up about not being able to have any cargo-raze based dependencies, without significant changes to cargo-raze itself. Note that JSON workers are a very new thing at this point, which would necessarily tie rules_rust to Bazel 3.7 or newer. I'm not sure if that is what we want for rules_rust. Perhaps the Re: writing this in Python, or modifying raze - I don't have the interest to do those changes, due to incidental complexity (Bazel transition and cycle-checker limitations, modifying another dependency etc.) or choice of language. I appreciate the feedback and encouragement from everyone here. If this is not the best way to proceed, I am OK with closing this PR without merging, but I'm not going to pursue these alternative approaches. |
I've posted an experiment using cargo on #517 |
I actually feel this might be worth keeping around, I'm investigating some ways rust binaries can be bootstrapped as part of #650 and would like to get something basic setup quickly. Maybe the same infrastructure could be applied here? But I'm also not against a C++ implementation, I'm just more excited about a Rust implementation 😄 |
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. It also requires targets to be tagged "incremental" - any without the tag will be compiled normally, even if a base folder is provided. Cargo's behaviour appears to be to only store incremental products for the product being built (eg, all the dependencies you pull in from crates.io do not have incremental build products stored), so manual tagging seems preferable to an "include everything" approach.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
This is considerably less invasive than bazelbuild#667, bazelbuild#517 and bazelbuild#421 - there is no extra code to bootstrap, and no toolchains have to be plumbed into the rules. We had been using a worker to get around sandboxing restrictions, but as @hlopko pointed out on bazelbuild#667, that turns out not to be necessary - even normal rules have access to /tmp. Unfortunately it seems that rustc does not expect the source files to change location, and it will consistently crash when used in a sandboxed context. So the approach this PR takes is to disable sandboxing on targets that are being compiled incrementally. This fixes the compiler crashes, and as a bonus, means we're not limited to saving the cache in /tmp. This PR adds a --@rules_rust//:experimental_incremental_base flag to specify the path where incremental build products are stored - if not provided, the rules will function as they normally do. The default behaviour will incrementally compile crates in the local workspace, like cargo does. The behaviour can be adjusted with another flag, which is covered in docs/index.md.
@mfarrugi
This is the outline of what it would look like to plug in the worker. There are a few thorny issues that need solutions:
The need to have
raze
emit_no_worker_
versions of the rules for all dependencies of the worker. This will need changes to raze to allow it to emit custom rule names and imports.I tried this with a project that uses Raze+rules_rust itself and quickly ran into the shared deps problem. Not only does the user of these rules currently need to run the (documented) extra steps to merge their own deps and the protobuf deps, they also need to pull in the protobuf patch (could be mitigated by running the build.rs?) and patch their own copies of the generated BUILD files to use the
_no_worker_
rules. This is a significant barrier to usage. Of course, if raze was modified, there is probably a way to annotate it instead, so that it generates no-worker variants for certain crates and their transitive deps. Ideally it should generate both worker and no-worker variants and then use the appropriate ones, so that any deps shared between the worker and user code (e.g.protobuf
) can still use the with-worker variant to build the protobuf crate for the user code.The worker actions will need a new mnemonic, otherwise changing the strategy for
Rustc
will make the worker compilation fail, since it uses the same actions, but does not support theworker
strategy. This is an easy fix.At this point, it is almost like letting the worker build with Cargo, releasing binaries and then having external repositories for the worker, the way we do for
rustc
itself may be another alternative. What do you think?