Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metabuild: semantic build scripts for Cargo #2196

Merged
merged 7 commits into from
Apr 9, 2018

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Oct 31, 2017

Introduce a mechanism for Cargo crates to make use of declarative build scripts, obtained from one or more of their dependencies rather than via a build.rs file. This allows us to experimentat with declarative build scripts in the crates.io ecosystem.

With this mechanism, crates can specify metabuild = ["somecrate", "anothercrate"] in their Cargo.toml, and they'll automatically end up with a build script that invokes somecrate::metabuild() and anothercrate::metabuild(). Likely providers of crates for use with metabuild would include pkg-config (and more general native-library finders), parser generators, binding generators, other code generators, and anything else that would ordinarily say "add this to your build.rs" in its README.

Rendered
Tracking issue

@joshtriplett joshtriplett added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Oct 31, 2017
```

Note that the `metabuild` functions intentionally take no parameters; they
should obtain any parameters they need from `Cargo.toml`. Various crates to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this refers to http://doc.crates.io/manifest.html#the-metadata-table-optional ? It’d be nice for the RFC to provide an example fn metabuild() implementation that does so.

@joshtriplett
Copy link
Member Author

Why combine multiple build dependencies into a single executable instead of compiling and executing them one by one?

That's a reasonable point. At the very least, I can mention that as an alternative. (Done.)

While uncommon, it may cause problems because they enable different features on dependent crates or depend on different versions of the same *-sys crate.

I'd expect Cargo to resolve the build-dependencies once, not once per script, so I don't expect this to differ.

After reading the RFC I still don't understand how build systems would use this. How would a build system get information about all the metabuild values that are contained in all the dependencies of the current crate? How would it prevent cargo from executing crates specified in metabuild and "provide its own versions of those dependencies"?

To give one of many examples: I expect to have a metabuild script that finds native libraries, using pkg-config at first and eventually using a wide variety of mechanisms. Such a script would get the information about what library to find from Cargo.toml. And other tools, such as packaging scripts for various Linux distributions or enterprise build systems, could read that same data and translate it to their own dependency metadata. For instance, debcargo could translate that data into dependencies on native library -dev packages.

@kornelski
Copy link
Contributor

kornelski commented Oct 31, 2017

Please don't require crates to parse Cargo.toml. I've done it in cargo-deb and it's been a bad experience and cargo-deb is still subtly buggy because of it.

At very least provide them basic key/value config, which could still be read from Cargo.toml, but by Cargo itself.

  • Parsing Cargo.toml is tedious and boilerplate'y. Even with serde, that requires defining structs, checking for missing keys/wrong key types, etc. It's much more work compared to (ab)using env vars.

  • Fragmenting/duplicating Cargo.toml parsing across many crates means it's going to be harder to introduce new Cargo.toml features, and 3rd party crates may not handle all of the subtleties of optional/implied keys, workspaces or (soon to be added) schemas.

  • Many more crates will now pull in Serde and/or other parsers. Rust compilation time is still not a solved problem, and Cargo still can't build unoptimized build-time deps in --release mode. Currently many crates avoid unwanted overhead by having a "serde" feature to opt-in, but if it's going to be required indirectly at build time by many sys crates, it's going to be impossible to avoid.

@kornelski
Copy link
Contributor

kornelski commented Oct 31, 2017

Do you expect the metabuild crates to be build-system specific like pkg-config or cmake wrappers, or mega-crates that combine/abstract away other build systems?

I wonder where platform-specific quirks should go. For example libjpeg-turbo uses autotools on non-Windows (and explicitly does not support them on Windows) and cmake on Windows.

Is that supposed to be encoded in Cargo.toml? metabuild.target.unix = ["autotools"]; metabuild.target.'not(unix)' = ["cmake"] or meta-meta crate? metabuild = ["autotoolsonunixbutcmakeonwindows"]

@joshtriplett
Copy link
Member Author

@pornel

Please don't require crates to parse Cargo.toml. I've done it in cargo-deb and it's been a bad experience and cargo-deb is still subtly buggy because of it.

I expect that we'll very quickly end up with a canonical crate that parses Cargo.toml that everyone pulls in as a dependency (ideally including Cargo itself). And once we see how that ends up, we can easily enough start exporting the relevant data from Cargo to metabuild scripts in a parsed form. Ideally that might even happen before we declare the Cargo metabuild interface stable. But I don't want to prematurely assume what that interface should look like.

(Also, I thought I'd seen a standard crate for doing that parsing already, but I can't seem to find it.)

It's much more work compared to (ab)using env vars.

Environment variables don't provide arbitrary data structures. How would you represent the following in environment variables?

[package.metadata.pkg-config]
testdata = "4"
testlib = { version = "1", feature = "test-feature" }
testmore = { version = "2", feature = "another-test-feature" }

Do you expect the metabuild crates to be build-system specific like pkg-config or cmake wrappers, or mega-crates that combine/abstract away other build systems?

Potentially either. At first, the former; later on, we'll also end up with things that detect and run multiple build systems, similar to debhelper's dh_auto_build.

I wonder where platform-specific quirks should go. For example libjpeg-turbo uses autotools on non-Windows (and explicitly does not support them on Windows) and cmake on Windows.

Does it support cmake on non-Windows?

I don't think the metabuild interface should include any conditionals directly, but I have an idea for how to handle that via build-dependencies. I'll push an update to the RFC.

@kornelski
Copy link
Contributor

Does it support cmake on non-Windows?

No. At least on macOS the cmake build requires several tweaks to work. There are plans to make cmake work cross-platform, but the project lacks the funds to execute it.

@kornelski
Copy link
Contributor

kornelski commented Nov 1, 2017

How would you represent the following in environment variables?

I mean build scripts currently already use ad-hoc env vars (like this) with whatever formats they need. I expect they would merely continue to do what they did previously instead of switching to more complex solution of parsing cargo.toml.

BTW, can a -sys crate be its own metabuild crate? Libraries like llvm or libpng have their own unique like-pkg-config-but-not-compatible-with-pkg-config solutions. Or should clang-sys depend on clang-metabuild?

@bbatha
Copy link

bbatha commented Nov 1, 2017

BTW, can a -sys crate be its own metabuild crate? Libraries like llvm or libpng have their own unique like-pkg-config-but-not-compatible-with-pkg-config solutions. Or should clang-sys depend on clang-metabuild?

Even if they have to be separate cargo workspaces plus private crates (#1977) make this pattern painless, and I'd argue that it guides you to structure your code better.

@joshtriplett
Copy link
Member Author

@pornel

I mean build scripts currently already use ad-hoc env vars (like this) with whatever formats they need. I expect they would merely continue to do what they did previously instead of switching to more complex solution of parsing cargo.toml.

I certainly hope not; those environment variables are not in any way standardized. By contrast, I wonder if we could standardize a single crate to invoke anything that looks like llvm-config, with the script path specified in Cargo.toml.

BTW, can a -sys crate be its own metabuild crate? Libraries like llvm or libpng have their own unique like-pkg-config-but-not-compatible-with-pkg-config solutions. Or should clang-sys depend on clang-metabuild?

If you still need custom code, I'd suggest keeping your build.rs file. We'll work our way through the cases that still need build.rs and extract more and more patterns from them.

Also, libpng has a pkg-config script, at least in the packages I have.

@est31
Copy link
Member

est31 commented Nov 1, 2017

  1. What about versions? If you have a crate dependency in Cargo.toml, you need to specify its version. Are you required to mention the crate name twice in the Cargo.toml?

  2. From the RFC text it seems this is intended to support experimentation. Can we maybe do it inside unstable cargo features, instead of adding something that has to be supported indefinitely by Cargo?

@djc
Copy link
Contributor

djc commented Nov 1, 2017

I don't really get this RFC. It seems to allow a crate to specify a different method of building for its dependencies, but it's not clear to me how/why this is related to "declarative" build scripts -- in fact, as far as I can see it's largely unrelated to this concept. Maybe the Motivation section could start with a motivating real-worldish example of what the problem is with today's cargo?

@joshtriplett
Copy link
Member Author

@est31

What about versions? If you have a crate dependency in Cargo.toml, you need to specify its version. Are you required to mention the crate name twice in the Cargo.toml?

Yes. For instance:

[package]
name = "somecrate"
...
metabuild = "parsergen"

[build-dependencies]
parsergen = "1.2"

From the RFC text it seems this is intended to support experimentation. Can we maybe do it inside unstable cargo features, instead of adding something that has to be supported indefinitely by Cargo?

I fully expect this to remain an unstable cargo feature for a while.

@joshtriplett
Copy link
Member Author

@djc This RFC makes it much easier to replace the build.rs scripts in every project (which many "outer" build systems need to override in various ways) with standardized components that can use declarative metadata and more easily support extensibility in standardized ways.

@est31
Copy link
Member

est31 commented Nov 6, 2017

@joshtriplett what about converting this into an eRFC instead? Then we'd have a second discussion before stabilisation which I think would be a very good idea.

@joshtriplett
Copy link
Member Author

@est31 I feel that an eRFC seems more appropriate for the case where we want to identify the problem but have no concrete solution proposal, or only the skeleton of a solution. In this case, the RFC proposes a specific concrete solution for experimentation, so I don't think that the eRFC process applies. However, I added some text to the RFC explicitly acknowledging that the interface may change in the course of experimentation, prior to stabilization. (This seems like the standard practice for any unstable feature; it can always change prior to stabilization.)

@lukesutton
Copy link

I love the idea of this feature. I could absolutely see myself using it. Here is a motivating example. For a web application, I might wish to perform some preprocessing — compile SASS, resize images etc. — as part of the build. Currently I use a custom build script, but that involves lots of boiler plate. It would be great if instead I could break these tasks up into separate crates. It would also give me the option of specifying config in Cargo.toml.

This would also allow for things like 'helper' image processing crates on crates.io.

For my use case here, another solution is to just add these 'helper' crates build-dependencies and configure and call into them in a custom build.rs.

@kornelski
Copy link
Contributor

kornelski commented Nov 8, 2017

In case of web oriented build scripts there was already a battle between declarative configuration-driven Grunt and program-driven Gulp, and Gulp has won. Declarative approach worked nicely only in insufficiently simple cases and wasn't powerful enough to seamlessly combine multiple tools together (e.g. you end up configuring two or more tools individually, each using their own configuration DSL, just so they happen to hit the same set of temporary files to meet in the middle, which is non obvious and fragile)

OTOH Gulp's approach is very similar to build.rs using external crates, and has proven useful and manageable even in large and complex projects.

@raphaelcohn
Copy link

As the maintainer of Libertine Linux, a fully reproducible, build-from-source distro, and the author of a comprehensive multi-package tool swaddle, I'm looking for an easier way to cross-compile Rust code that uses third party dependencies (ie compiled C libraries) than the current morass of various build.rs scripts. This is particularly hard when build.rs authors wrap dependencies that themselves use CMake (uggh), autotools (double-uggh, bonus uggh for libtool) or pkg-config (triple-uggh), all of which naively assume the current system being built on is the deployment system; not their fault, it's just that the C world has a wide variety of variously broken build systems. I've written several -sys packages, too, and I've never found that there is one particular approach that works well. All are problematic to a certain degree. Toss in naive libc usage, yacc, bison, perl, and related garbage and problems become horrendous.

In essence, simply the ability to specify a list of dependencies and locations is what I want. If that means I can have a way to replace a crate's build.rs with my own, that'd be sweet. It'd also be nice if eventually crates split Rust codegen (eg via bindgen) from libc compilation.

@lukesutton You might like my upcoming static webserver cordial.

@joshtriplett
Copy link
Member Author

Following up on this, what's the next step here? In one of the @rust-lang/cargo meetings, it seemed like there was general approval of the concept here. What would it take to move this forward?

@nrc
Copy link
Member

nrc commented Mar 27, 2018

The mechanism proposed here seems vaguely related to the custom test frameworks proposal - in the way that we want to execute code from a dependency in a compile-time context. I'm not sure whether that has any significance - I can't imagine some unified mechanism.

@joshtriplett
Copy link
Member Author

@nrc I can't imagine a unified mechanism either. And if there is, I think we could treat that as part of the implementation, not the RFC.

@joshtriplett
Copy link
Member Author

We discussed this today in the Cargo meeting at the Rust All-Hands. Part of the conclusion was that it would make sense to go forward with a minimal experimental implementation of metabuild, enough to allow the ecosystem to experiment with metabuild crates. We can iterate later on both the interface between cargo and metabuild crates, and on the implementations of common functionality needed in metabuild crates (e.g. Cargo.toml parsing or dependency finding), as part of our experimentation.

@aturon
Copy link
Member

aturon commented Mar 29, 2018

Thanks @joshtriplett for the summary! We're excited to start experimenting in this area, and it's important to allow that experimentation to happen early in the year, with hopes that at least some parts of the ecosystem impact will be known prior to the 2018 edition. As such:

@rfcbot fcp merge

@rfcbot
Copy link
Collaborator

rfcbot commented Mar 29, 2018

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Mar 29, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 29, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Mar 29, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 8, 2018

The final comment period is now complete.

@Centril Centril merged commit c09e805 into rust-lang:master Apr 9, 2018
@Centril
Copy link
Contributor

Centril commented Apr 9, 2018

Huzzah! 🎉 This RFC has been merged!

Tracking issue: rust-lang/cargo#14903

@Centril Centril added the A-build-scripts Proposals relating to build.rs scripts. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-scripts Proposals relating to build.rs scripts. final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.