-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use Fromager to solve the bootstrapping problem? #199
Comments
I find it hard to imagine any system integrator accepting that constraint as part of building a modern secure software delivery pipeline.
Yes, that's correct. PEP 517 (I think? maybe another standard) implied that build backends were expected to avoid introducing cyclic dependencies. If that's not actually the case, I'd have to see the actual dependency cycle to know how I would address it.
Pre-seeding the wheel server is one option we've considered, but we haven't needed to do that, so far.
This isn't really a problem we've been trying to solve directly. That said, we've hit on a few techniques. The flit_core test is one example of an approach to solving that problem: https://github.com/python-wheel-build/fromager/blob/main/e2e/flit_core_override/build/lib/package_plugins/flit_core.py There we invoke the build using the manual instructions of flit_core, even though I think it does work with PEP 517 and it wouldn't be needed. It's an example, though, of how the person using fromager can provide a plugin to do whatever is needed for most points in the process. In a couple of cases we add pyproject.toml files to source directories to make the PEP 517 approach work because the project's setup.py imports something that is expected to be manually installed before installing the package in question. Dao-AILab/flash-attention#958 is one example of that. We're also building several things that rely on cmake, and for those we remove the dependency and rely on the version of cmake provided with the OS. In each case it requires some work to set up the plugin or patch or whatever. But we're currently building several hundred packages, and only on the order of 10 have these customizations. Most of those are things for which there are no sdists on pypi.org at all. |
I tried using fromager to build
I'm pretty sure the crux of the problem is illustrated in:
As no
I believe
If you can find this reference, it would be crucial to the discussion. Regardless, it's this assumption that we need to break. It essentially leads to "build backends can't have dependencies."
The only reason flit_core can do that is because it has no dependencies (build time or run time).
I really appreciate the engagement here. I haven't gotten much from other system integrators. Would you be interested in having a high bandwidth conversation about the issues involved? |
You're right. It's in PEP 517 here.
|
This is an important point, but let me dig a little deeper. Is it more secure for the system integrator to bundle the artifacts for bootstrapping or for each build backend to bundle their own artifacts for bootstrapping (as "sources")? |
I think here you want to set the
From PEP 517:
Based on the 200-300 packages I've been working to build, plenty of other build-time requirements end up depending on flit-core, though. They just have to be careful to adopt things that don't introduce a cycle.
Sure. The timing isn't great this week, because of work project deadlines, but maybe in a few weeks? |
I can read source code to tell if it has a backdoor. I can't read pre-compiled binaries. |
Sounds great. I sent an invite. We can coordinate over email. Looking forward to it. |
I get that, but what if the pre-compiled binaries are just pure-Python wheels? Ultimately, I'd like for build backends to be able to depend on any Python library, even one that might have compiled extensions, but in the current configuration, that's not needed and I don't want that to be a blocker preventing backends from having any dependencies. The pre-built artifacts I'm speaking of are essentially the wheels that have been unpacked into the source code. If instead the system integrator were to keep a (small) set of trusted artifacts available for bootstrapping, it could circumvent the cycles. Those trusted artifacts themselves would have been built from source on a previous iteration of the system, so everything ultimately is derived from source; it's just not all built from source from scratch. These artifacts could potentially include compiled binaries as well, though that complicates matters (requires environment-specific variants, reduces inspectability). I see this process as akin to how GCC is bootstrapped. Since GCC requires a C compiler, it can't be built purely from source, and requires some binary artifact to break the cycle. I'm guessing most other ecosystems don't run into the challenges that the Python ecosystem does because those ecosystems have a standard packaging tooling that's distributed with the language. Are there examples of other packages that suffer the bootstrapping problem (due to interdependencies of build tools) that may have solved it somehow? |
If they're pure python, why not install them from source?
The problem is coming up with a way to trust those artifacts in the first place. I can't really read bytecode any better than compiled C. If I don't start with something I can read, then I can't be sure the packaging tool isn't injecting bad things into the packages it's building.
I would expect all packaging tools to have aspect of this bootstrapping case, with the difference being that their dependencies might come in forms other than packages built by the tool itself. I don't know the history of rpm/yum/dnf or apt, but they're robust systems that must have dealt with this. I see 2 ways to let build tools have dependencies:
|
In pypa/packaging-problems#342 (comment), it's been suggested that Fromager can solve the bootstrapping problem because it allows for
prepare_build
to pull built wheel artifacts. It's unclear to me, however, how Fromager breaks the cyclic dependency problem. Stated simply, the bootstrapping problem is that if a build tool depends on itself or any package that requires that build tool to build, there's no way in general to build entirely from source. There's no order in which all packages can be built purely from source.A few examples:
Currently, none of these scenarios break the build-from-source world because:
These assumptions are untenable because they impose undue constraints on build tools and their dependencies.
In https://hackmd.io/@jaraco/SJSQ40tv0, I'm proposing a methodology that system integrators would need to adopt to break the cycle and allow build tools to declare dependencies. The tl;dr is that the integrator needs to provide pre-built artifacts for all supported backends (including their dependencies) and it can't expect to build those from source.
Reading the bootstrapping mode, it seems that fromager only has support for building when there are no cyclic dependencies.
What does Fromager do when a package depends on itself, directly or recursively, at build time? Does Fromager allow the private wheel index to be pre-seeded with "trusted" pre-built artifacts that can break the cycle?
I'm going to try some experiments to verify, but I'm interested in this project's maintainers' insights.
The text was updated successfully, but these errors were encountered: