-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Replacing the Cabal Custom build-type #60
RFC: Replacing the Cabal Custom build-type #60
Conversation
I think the overall goals and context are presented with much more clarity than before, and the overall architecture is more well motivated, which I appreciate. That said, while this proposal is well thought through, it seems a regression, not advance in that presented in the draft in one key sense. It proposes way more hooks than before, with very little motivation for many of them. Further, it continues to propose hooks that have at most one or two users, without exploring if those cases could be resolved in other ways. The "minimal" vs "maximal" distinction presented here is about inputs and outputs available to the hooks. This was not the core concern at all. Rather the objection raised was to how many hooks to provide. The argumentation in this section is unclear because it in fact veers between making arguments about both. If we take it to be about how many hooks to provide, it provides two very poor arguments -- first, that "maximal" means everything can be moved rapidly (we know it won't be, because that's not how development across many packages works) and secondly that "once you have some hooks, you can just add more, its fine, there's no cost!" This is not true at all. Every part of an api surface provided locks the provider into providing it for an indefinite and lengthy period of time. The cost is not of adding more hooks -- the cost is of maintaining them, which will be borne not by the proposers, but by the cabal developers, indefinitely. We do not need test hooks, we do not need benchmark hooks, we do not need clean hooks, and there are also no use-cases provided for postconf hooks. and I am quite certain that we can eliminate, with a little discussion with the Agda and Darcs teams, the need for postbuild or copy hooks. In particular the darcs copy hook is simply to install a manpage. Surely, there must be some way to install a manpage without needing a custom setup hook, and if not we should provide one. More generally, there had been a request that for each hook (outside of preconf and prebuild, where the reasoning is evident) a justification be given for it with an inventory of current packages requiring it, and for what purpose. This hasn't been done. There's a smattering of examples in some cases, but on the whole the argumentation is speculative -- just about what may potentially be needed. This particular line of argumentation is to me the least convincing of all: "However, the maximal approach should reduce the likelihood of situations where existing custom Setup.hs files cannot be migrated to hooks because they require additional information. Given that our goal is to remove the need for Custom entirely (from the last few percent of packages), and those are inevitably edge cases where the use cases are not always easy to predict, it is important that we handle as many as possible (rather than covering common use cases only)." What this argues is that we should provide extremely powerful hooks for which we have no known use cases because who knows, maybe someone does, and then they won't be able to migrate. So to cover edge cases that we don't know if they exist, and which we cannot necessarily even imagine, we need to provide an API that is maximally powerful and which we are obliged to support indefinitely. That philosophy of development cannot guide us in building our tools, or we will constantly end up dead-end situations which are hard to extricate ourselves from. More than any specific objection, my concern is the overall shape of this proposal is in accord with that philosophy of software design, and we cannot operate that way if we hope to have a long-run sustainable path for development and maintenance of cabal. |
On versioning, I think the only reasonable thing to do is to generalize the existing mechanism for custom-setup-depends to cover setup-hooks-depends and version along with the |
I think it is fine if we have fewer hooks and some packages cannot do things they did before. For example, to my limited knowledge, a lot of the package doing weird things are executables. Such root packages I think should simply do what weird things out of band. Package formats like Cabal are primary geared towards libraries, where "do it out of band" is not an option because those packages are dependencies. (An exception can be made for executables that are intended to be used with In generally, we shouldn't slavishly support every existing sketchy thing. We should have interfaces that make sense, and if package authors cannot use them, that's on them. I've rarely seen a custom setup I thought was any good.
For example, "copy phase" is a non-concept. It doesn't mean anything, it is just a holdover from what we have today. It is "non denotational" (as Conal Elliot would put it) festering wound. We should be taking the opportunity of getting rid of custom setups to kill it with fire, not extend itself life. I could maybe be sold on some of these hooks if they were immediately deprecated. @adamgundry talked about wanting to mass-convert existing packages in a very rote way. I guess I could buy that as a temporary measure. But we should make clear what we are doing if we go this route so no one is upset that it changed once just to change again. (I would want to eventually drop support for the early too-many-hooks design just as we drop support for custom setups.) The alternatives discussion covers the previous conversation on fine-grained hooks well enough. I will just add
Right so this gets back at long term goals. I think of packaging as a very "schoolmarm" thing where people are lazy do the firs thing that works and you have to prevent from doing bad things, because of other people that rely on their code. I absolute want to see today's bad imperative code rewritten, and I want Cabal to force it to happen. Again I can be convinced that this is better as a follow-up project to a "let's get off Setup.hs as quickly/simply as possible" project. But it would be a real bummer of the momentum petered out after the "quick/simply as possible" first step. |
@gbaz thanks for your feedback.
I can try to clarify the text here regarding the arguments for how many hooks to provide. However, given that all these hooks already exist in We can discuss with the Agda or Darcs teams how to move away from their uses of specific hooks. But what about other packages on Hackage, or indeed not published at all? I agree that if there is a compelling reason to cease supporting some use cases, we can do so. But I don't currently see what that reason is.
We're working on this, and can try to back up the argument that these features are needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks broadly good to me. I agree with @gbaz that fewer hooks might be better. It seems to me that we could potentially get rid of clean, test/benchmark, and pre-processors.
packages. | ||
|
||
By way of example, consider an IDE backend like HLS. An IDE wants to *be* the | ||
build system (and compiler) itself, not for any final artefacts but for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a helpful way to think about this is as follows. We have at least three "build systems" that we want to support:
cabal
'sSimple
build system, as run bycabal-install
(perhaps with clever parallelism or what not)cabal
'sSimple
build system, as run by tools other thancabal-install
using theSetup.hs
interface (e.g. nixpkgs)- HLS's Shake-like build system
We thus want to provide a means of user customisation that supports all of these build systems. (You could imagine others, pretty much anything that fits into the phase-based design would work here.)
So I agree with the above comment that we want to provide the build system (rather than the user providing it), but I do think that part of the reason for that is precisely so different tools can provide different build systems!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I say below in https://github.com/haskellfoundation/tech-proposals/pull/60/files#r1404531123 I think the current design is terrible for HLS, which needs fine-grained hooks or bust. That's not necessarily a problem! But we should be clear what is actually good and what is a mere transitional step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a helpful way to think about this is as follows. We have at least three "build systems" that we want to support: [...]
Thanks, that's quite a clarifying perspective. I will be including this explanation in the updated proposal.
to the build system. This mechanism should be designed on the basis that the | ||
build tool, not each individual package, is in control of the build system. | ||
Moreover, the architecture needs to be flexible to accommodate future changes, | ||
as new build system requirements are discovered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this point tends to support @gbaz 's argument that fewer hooks are better. The more hooks we have, the more constrained we are to match the current phase structure, for example.
* It should provide an alternative to the `Custom` build-type for packages that | ||
need to augment the Cabal build process, based on the principle that the | ||
build system rather than each package is in overall control of the build, | ||
so it is not possible to entirely override any of the main build phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the structure of the phases seems important, but I'm unsure if it's strictly necessary to say you can't override an entire phase. HLS wouldn't be able to work with a tool that overrode the build phase, for example, but it probably could work with a tool that completely overrode the configure phase.
and target. This means the `Custom` build-type currently leads to issues with | ||
cross-compilation, and in the first instance, the new design may inherit the | ||
same limitations. This is a bigger cross-cutting issue that needs its own | ||
analysis and design. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design doesn't seem like it makes much difference to cross-compilation. In both cases the problem is that you want to build and run custom Haskell code with user-specified dependencies on the build architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah what is bad for cross compilation is post hooks where people try to run newly-built executables. That should not be a supported use-cases except for tests. And even with tests is should be important that we can cross compile the tests and on machine, and then finish the build actually running the tests on another machine.
cross-compilation, because it does not make a clear distinction between the host | ||
and target. This means the `Custom` build-type currently leads to issues with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cross-compilation, because it does not make a clear distinction between the host | |
and target. This means the `Custom` build-type currently leads to issues with | |
cross-compilation, because it does not make a clear distinction between the build | |
and host. This means the `Custom` build-type currently leads to issues with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
It is important that these copy hooks are also run when installing, as this | ||
fixes the inconsistency noted in [Cabal issue #709](https://github.com/haskell/cabal/issues/709). | ||
There is no separate notion of an "install hook", because "copy" and "install" | ||
are not distinct build phases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should bite the bullet and rename "copy" to "install" while we're here (this is what nixpkgs does, for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
are compiled and run. The main use cases for pre-test or pre-benchmark hooks | ||
are generating modules for use in tests or benchmarks; while technically this | ||
could be done in an earlier phase, if the generation step is expensive it makes | ||
sense to defer running it until the user actually requests tests or benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't a per-component pre-build hook good enough for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should always consider the difference between building tests/benchmarks and running them. As far as building is concerned the are just completely normal executables, and our custom build system language should treat them as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
are not distinct build phases. | ||
|
||
|
||
### Test and benchmark hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no use cases for benchmark hooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, pre-build hooks for benchmark components is sufficient. The rest can be done in the benchmarking code itself, I would think.
|
||
* the package author wants `cabal test` to run doctests (so an external `cabal | ||
doctest` command is not enough), e.g. because they use a different build tool | ||
and need `./Setup test` to include doctests; and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this be done with a build hook?
type in the hooks API, we may want to reconsider this. | ||
|
||
|
||
### Build hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build hooks seem problematic for another reason: what is the "build phase" in HLS? Building is continuous, and is re-done frequently but incrementally.
- When do we run the pre-build hook? Every time we want to incrementally re-build anything? (this would be significantly improved if we followed my suggestion about making hooks more shake-like, then we could actually re-run them only if needed by the thing we're currently building)
- When do we run the post-build hook? Arguably the build phase never "finishes" (or we might stop building, but without having built all the modules). So maybe we don't run it at all? Does that make sense?
Perhaps my objection here is broader. If we think about more general cases where our build tool just has a big graph of build rules:
- Pre-phase hooks are kind of okay so long as they have good dependency information. They are rules and we just have to make all the "Phase X" rules depend on their output.
- Post-phase hooks are weird. Normally we work on a demand-driven basis, so when are these demanded? And what do they depend on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good point regarding post-phase hooks not having clear demand. Perhaps it would be better to say that all hooks should be pre-something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
A design that is good for HLS / fine-grained world can easily be retrofitted on the course-grain world by simply running all the fine-grained hooks once as part of some larger step.
-
A design that is good for current Cabal / course-grained world will be horrible for the fine-grained world.
If are are trying to make something that is nice for HLS from the get-go, we should be much more aggressive. If we are trying to make something that is easy to mindlessly batch convert existing Setup.hs
we should be clear it is wholly inadiquate for the brave new HLS world and just an intermediary stop-gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this comment, I think I was too hard on post-build hooks. If we think of things from the rule perspective, then the build phase runs many build actions, one for each module. It then has a "phony" rule for the "build phase" that depends on all of these.
HLS very much does have the per-module build rules, but it doesn't have a "complete the phase" rule. It would be sensical for HLS to have post-build hooks that run after building a particular module, it just doesn't make so much sense to have hooks that run after "completing the phase".
An alternative approach would be to regard as illegitimate any use cases which | ||
treat `Cabal` as a packaging and distribution mechanism for executables, and on | ||
that basis, cease to provide copy hooks. We do not follow this approach because | ||
it would significantly inconvenience maintainers of packages that rely on this | ||
behaviour (e.g. Agda and Darcs), for a relatively small reduction in complexity | ||
in `Cabal`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not pleased that this is dismissed so quickly.
-
I do not buy that the inconvenience is significant without actually talking to then. Changing build instructions from
cabal install agda
tocabal install agda; ./build-agda-stdlib
is not a big deal at all. Because no one hasAgda
as abuild-depends
there is no good reason for the cabal build to be self contained. -
The point isn't "making Cabal simpler", but opening the door to architectures. What exactly is the semantics of "copy"? Do you need to "copy" the upstream library before the downstream project is built? What should HLS, should doesn't want to install random things during development / incremental builds do for these?
The entire copy phase is a blatant post-hook to building --- the same objections @michaelpj raised to post hooks apply to this. It has no semantics. It cannot be thought of in a demand-driven way. There is no spec for which other build systems like a new HLS one could implement this.
This proposal is exceptionally well presented. This is a great introduction to the subject, for someone (like me) who hasn't thought about these things. As others have pointed out, I was also surprised that the
Like others here, I urge for restraint in trying to cater for a long tail of uses. I would guess that the marginal cost of supporting 99% of previous usage of the Concretely, I am interested in seeing an expanded 'Prior art and related efforts' section which describes how other communities have addressed this problem. I recall that the Python community is moving away from arbitrary code at build time (the |
In general, the space of possible effects needed to augment the build system is | ||
unbounded (e.g. imagine a package that needs to generate code by parsing some | ||
binary file format, with a parser implemented using the FFI). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why the "imagine all user code is loaded in one big process" though experiment I think that leads us astray.
Fundamentally build systems needs no effects at all. The whole thing is semantically a pure function, individual steps are pure functions.
The sandboxing / process-isolation mindset is "you, the guest rule, can think you are doing whatever crazy effects you want, but me the host do not care. I will fake all your effects; you are just a pure function to me with delusions."
In this case, sure a module needs to be generated by some code that needs to use a C library. But in practice it may well be better to force that to happen in a separate process, just for that one "generate module task:. The host build system should not be "sullied" by this FFI stuff. If we imagine it all in once process, we should be clear this is sort of an unsafe optimization.
What I worry is having the hooks file be Haskell is that we are putting the "unsafe optimization" cart before the "get the conceptual model right" horse.
It might be annoying for users to write separate exes for everything, but it makes it very clear what the host build system is supposed to do / be. The DSL is thus very simple, just a way to write a list of arguments, refer to a build-tool-depends and some $<
/ $@` variables to refer to input/outputs.
I think michael's comments from the HLS side have been helpful to flesh this out more. The existence of hooks means we have certain phases that are run in certain ways in our build processes. So to the extent we want to change the phases or ways, we run the risk of changing the behavior of the hooks, or disrupting their APIs. And some ways may even invalidate the meaning of some of those hooks. The more API commitments we make, the harder it is to change anything -- and often we make commitments we don't even realize. While not directly pertinent, I'll note that many seemingly totally innocuous improvements to As another example -- the more we let packages do unrestricted IO on files, the more assumptions they may (wrongly) make about the directory structure -- which is something we're definitely open to refactoring. Finally, let me note a bit more motivation here -- I think the way darcs installs manpages is pretty clever. However, most people aren't going to do that, not least because it involves a custom setup script. If we had better, simpler, more declarative support for manpage installation, then we could have more users, more safely doing so -- which would be a win all around. Figuring out the right way to do this helps everyone. Just keeping the existing copy-hooks mechanism essentially as is does not. |
I would like to thank @gbaz @michaelpj and @Ericson2314 for their fantastically clear and helpful feedback on the current iteration of the proposal. The lack of a demand-structure for post-hooks is a convincing argument that they shouldn't be part of the design. Together with the fact that we can conceive of test and benchmark hooks as pre-build hooks for the relevant components, this would allow us to narrow down the design to only pre-configure and pre-build hooks. I see then that cutting the hooks down like this affords us further flexibility (as Gershom has been saying since the start), and in particular suggests that we should have a design that implements fine-grained pre-build hooks from the get-go (as John has been advocating). This would allow the pre-build hooks to be run on-demand by HLS when relevant dependencies change, as opposed to being monolithic. This should also subsume hooked pre-processors (as Michael points out). I am still trying to make sense of exactly what these fine-grained build hooks would look like. In particular, it would be great if these covered all pre-build use cases, so that we don't have to shackle ourselves by also providing old-style pre-build hooks just to cater for a few packages. I have two use cases in mind which I think of as constraining the design:
I understand the desire to not cater to every sketchy thing done by some Custom setup script somewhere, but to me both of those use-cases are clearly motivated and should be accommodated by the new design. If a strong argument is made that they should not be accommodated by the design, we would need to have a plausible migration path. I will keep thinking about what the specification for fine-grained pre-build hooks would look like given these requirements. I would appreciate any further thoughts about the design (whose starting point I take to be the "fine-grained build rules" suggested by John, as described as a possible alternative in the current proposal). I intend to update the current proposal with the modifications described above (only retaining pre-conf and pre-build hooks and making the pre-build hooks finer grained). I also hope to nail down a tighter specification for what build hooks are allowed to do, so that we know when to re-run them and so that we can know how to clean up after them. |
I'm not 100% sure I agree with John here. I think that as long as the hooks provide clear dependency information, we would still get most of the benefit. Lots of things that run in hooks are acceptably fast. It would be nice to only rerun the preprocessor for a single file when the input file changes, but it might not be the end of the world to run it on all the files. And many module-generation processes are monolithic in any case (e.g. generating them from Agda).
I think we can probably limit the fine-grainedness to be either: global, or at a per-file basis. So a fine-grained hook might just take the current module as an additional argument, which would allow it to e.g. just generate that particular module.
Surely the proposal as it stands has this problem: HLS can't run the |
I agree fully. Its unfortunate to have to determine the exposed modules dynamically at configure- or build- time, but its definitely a key use-case for custom setups, and one that deserves good support. Similarly determining contents dynamically from surrounding context is also a core part of functionality. If we didn't need to do these sorts of things, arguably, we would barely need custom setups or hooks at all -- just extensible preprocessors. |
Yes dynamic exposed modules sounds fine to me. And if it's easier to generate all those modules in one go too, that's also fine. (The rules to create modules can just close over data produced in the decide-which-modules step.) The general fully fine-grained dynamic dependencies way do do things would be for the other / exposed module and rules for creating those modules (but not the module themselves!) to be generated at once. That solves the "define exactly once" problem where (a) no rule can create a non-declared module and (b) every module is mapped to exactly open rule. I think it's probably OK to deviate from the model and it's correct-by-construction nature in the first version for whatever practical reason (e.g. manually checking that module declarations and definitions align), but I bring it up in hope it's still a useful mental model. |
@Ericson2314, I would like to question the following claim:
I initially found this argument to be quite compelling, but now I'm thinking that one might well want to have |
"now I'm thinking that one might well want to have build-tool-depends: agda" -- I agree here. However, if you're depending on agda as a build tool, you certainly could run a |
@sheaf Also I think the Agda status quo doesn't really work in general anyways. Say you want to depend on an Agda library besides the the standard library --- you'd be stuck just building that other library on the fly, just like @gbaz's calling (Also, I am generally skeptical of things that privilege the standard library over other libraries, because I think the lack of uniformity causes more problems than it solves.) Ultimately I think the right thing to in this situation is this: Imagine there was a language specific package manager for Agda. Then the question is asked --- which build system should be in the drivers seat! I think neither should be per se. If we have a good planning vs execution split in both of them, ideally they can both spit out build graph which we compose together before executing. This has the best execution times properties: Suppose we have multiple haskell packages generated from Agda, each depending on multiple Agda libraries. The combined build graph should ensure that the Agda libraries builds are reused for both Haskell package, without relying on hacky things like Cabal invoking another tool that mutates some shared state (another build system with cache). Figuring out the details of that I think is outside the scope of this proposal, but I wanted to at at least sketch it out to indicate that retreating from the thing we have today with |
I do agree with you that the cross-language situation is sticky. I think for Haskell packages it makes sense to focus on what we can distribute via Hackage. And there we really want to be able to build the package with plain cabal and no funny business. You can make that work with The other context we might want to consider is where a different tool is driving cabal. For example, we have another ugly case where we have a Haskell package extracted from Coq code. This requires setting up a quite specific Coq environment in order to work. There is no hope of ever publishing such a package on Hackage... but with a custom setup we can make it work in our Nix environment, which is handy for us. So the fact that Cabal can run some arbitrary external tools makes it possible to supply it with "just the right thing" and have it do what you want. |
Right. but that case will still be covered even in a cut-down version of this proposal, because for now we all agree that the pre-configure and pre-build phases should allow use of arbitrary applications, exactly as you desire! (And honestly, I think in that situation a haskell package extracted from coq code would be reasonable to publish on hackage -- we just wouldn't expect the hackage builder to run on it) |
It sounds like we agree we shouldn't get too hung up on polyglot code generation. Good! :) |
I'll be updating the tech proposal in the coming few days, with a proposed design for fine-grained hooks that suits the needs of HLS. I'll be marking some threads as resolved in the meantime. |
In this iteration of the proposal, we cut down the amount of different hooks included in the API, and include a design of fine-grained build rules that plays well with the design of HLS.
@gbaz @Ericson2314 @michaelpj I have updated the proposal, cutting down on the amount of different hooks provided by the API and including a design of fine-grained rules for pre-build hooks. I would greatly appreciate it if you could take the time to review these changes and give your thoughts on the design. |
Thanks! This seems very promising. Here's the diff, for those (like me) who it might help to see the differences highlighted: adamgundry@2e6698e On first glance, a few thoughts.
I will need to think through the fine-grained preBuild rules more carefully but they seem very promising. I believe they can be used in a funny way to arrive back at the less-fine-grained rules necessary to just call out to do distinct code-gen for e.g. the gtk bindings? That should probably be noted explicitly -- we don't lose generality here, just gain expressiveness. I also do think that it is fine to not have rule patterns for the first pass, but if fine-grained rules become popular, I imagine there will be desire for them. Finally, there is a suggestive sketch of how build-tools might create their own "hooks executables" linked against SetupHooks.hs as possible future work, and the possibility of such motivates serialisation considerations for the fine-grained rules. But I confess I am confused as to why we worry about serialisation there, and also not for other hooks which also are functions. The motivation for why we worry about serialisation in some places and not others, and what benefits we hope to gain seems unclear to me. |
(by the way we discussed this at the tech working group today -- unfortunately the revisions dropped too close to the meeting for us to have had a chance to review them, but we were all hopeful for what they might look like, and from my standpoint, now that i've looked through, things are moving very much in the direction we were looking forward to.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction of travel. I would really really like a comparison with Shake or with existing rule-based build systems in general. The system proposed here is different from any existing one I've seen and I would really like to know why we are deviating. Surely the desire to support the IPC-based workflow is part of it, but I can't see that it necessitates as much difference as there is...
build system (and compiler) itself, not for any final artefacts but for the | ||
interactive analysis of all the source code involved. It wants to prepare (i.e. | ||
configure) all of the packages in the project in advance of | ||
building any of them, and wants to find all the source files and compiler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessarily true! We would be very happy to delay configuring a component until we want to build a file in it. We don't do that because it would be harder but it's conceptually sensible. I don't think there's an actual problem here, though, since the current design does have fine-grained configure hooks at the package/component level.
|
||
For each component, `preConfComponentHook` is run, returning a `ComponentDiff`. | ||
This `ComponentDiff` is applied to its corresponding `Component` | ||
by monoidally combining together the fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still somewhat unsatisfied with the argument here. As a user of the API it would strike me as weird that these two conceptually very similar things were so different.
|
||
Separately from the pre-build rules, we also propose to introduce post-build | ||
hooks. These cover a simple use case: namely to perform an IO action after an | ||
executable has been built. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the observation that most post-build hooks are really for after an exectuable is built is enlightening.
There is a clear difference in the build process between building a library component and building an executable: they both build all the Haskell modules, but for an executable we additionally link them together into an executable.
So I wonder if we are missing a phase: the link phase. Most of the post-build hooks you describe are then actually post-link hooks.
type in the hooks API, we may want to reconsider this. | ||
|
||
|
||
### Build hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this comment, I think I was too hard on post-build hooks. If we think of things from the rule perspective, then the build phase runs many build actions, one for each module. It then has a "phony" rule for the "build phase" that depends on all of these.
HLS very much does have the per-module build rules, but it doesn't have a "complete the phase" rule. It would be sensical for HLS to have post-build hooks that run after building a particular module, it just doesn't make so much sense to have hooks that run after "completing the phase".
serialising and deserialising the `IO` actions that execute the rules, which in | ||
practice would mean providing a DSL for `IO` actions that can be serialised, | ||
|
||
2. it lacks information that would allow us to determine when the rules need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point I started to go "why are we not just copying Shake?". In particular, I think it's pretty well-established how to deal with this kind of thing (see e.g. "Build Systems a la carte").
I am sure the authors are aware of this, so it would be really helpful to have a comparison to the existing prior art in this area to make it a bit clearer where the design comes from. It's a little hard to tell in isolation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a justification for why we are deviating from prior art is lacking from the current proposal; we will add that in the next iteration.
I would like to know what precisely you mean by "copying Shake" here? To me, the relevant question is rather "why are we not just copying Ninja?". I say that because pre-build rules should be structurally simple enough that they can be ingested by other build systems, just like Shake can ingest a Makefile or a Ninja file, constructing a build graph out of it. If we start having arbitrary monadicity inside the structure of the rules, then I don't know how another build tool such as HLS would be able to interface with the hooks; we need some way of exposing the dependency graph of rules externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying Ninja might also be good! It's just really hard to know how to assess this when it seems to be its own special snowflake at the moment. Or, say, adopting the terminology of "Build Systems a la Carte" might help.
I don't see why arbitrary monadicity is even necessarily a problem. Tools with dynamic dependencies like Shake do need to tell the build system which dependencies were "discovered" when running a rule, but I don't see why you can't do that gracefully over an IPC connection or whatever.
I think the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the issue is more with the scheduler? You want to do an up-front schedule, which requires no dynamic dependencies. I do agree that Shake's "suspending" approach to making this work is going to be tricky over IPC.
But to go on about this: it would be very helpful to read something like "this system is basically Shake but with no dynamic dependencies, which we did for reason X".
We can then separately query the external hooks executable with this reference | ||
in order to run the action. | ||
|
||
We fix (2) by adding `monitoredValue` and `monitoredDirs` fields to `Rule`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure why we can't do what Shake does by pushing the responsibility for deciding whether to rerun into a rule, and then implementing early cutoff on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for deciding whether to re-run cannot live inside the rules themselves, because rules are not persistent across invocations. This means that there is no way to keep track of whether e.g. a certain value has changed inside a rule. It's only the tool that queries for the set of all rules that can compare the result of multiple invocations and see whether things have changed or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shake explicitly puts the persistence outside the rule. It's similar to your monitoredValue
. You get a fingerprint value, which is stored fro you by the build system, and passed to the rule when it is run. The rule computes the new value of the fingerprint, and can terminate early and say that nothing changed if that is the case.
https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake-Rule.html#g:1 explains
`A.hs`. It is much more direct and robust for rule 2 to directly declare its | ||
dependency on `A.y`, rather than on (the output of) rule 1, as the latter is | ||
prone to breakage if one refactors the code and changes which rule generates | ||
`A.y`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this can be true if one gets the location of A.y
from one's dependency on the output of a rule. Then if the rule stops producing A.y
, you won't get passed a location that contains A.y
and so you find out.
Generally I think the reverse - the rule structure is fundamental, and files are just one kind of output from rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we agree here. What this paragraph was trying to say is that it is better for dependencies to be described by paths rather than directly referring to other rules. For example
registerRule $ Rule { results = [ "A.y" ] }
registerRule $ Rule { deps = [ "A.y" ], result = [ "A.hs" ], action = alex }
is more robust than:
rule1 <- registerRule $ Rule { results = [ "A.y" ] }
rule2 <- registerRule $ Rule { deps = [ rule1 ], result = [ "A.hs" ], action = alex }
because the latter will break if you change which rule generates A.y
. The latter style requires the user to perform dependency analysis themselves, whereas in the first example it is the build system which figures out the dependency.
Perhaps you are saying that the low-level API could use this style (in which rules directly refer to other rules), and provide an interface on top that would work as the current proposal describes? I'm not really sure what that would buy us, especially given that the IPC requirement means that it would be difficult to allow users to declare their own datatypes for use as RuleId
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't find this at all convincing. I've worked with systems that work both ways and I find the version that works with rules much clearer and easier to understand. With the other version it is IMO harder to maintain because there is magic in figuring out what rule produces A.y
when you look at the second rule definition. You can't just see what rule it is, you have to go through all the rules and look to see which ones produce A.y
.
because the latter will break if you change which rule generates A.y
Can we have a realistic example? I just don't see how this is going to happen in a way that's bothersome. Yes, if you change a rule you may have to go and fix the things that depend on it... that doesn't seem at all surprising?
The latter style requires the user to perform dependency analysis themselves, whereas in the first example it is the build system which figures out the dependency.
What? It requires the user to know which rule produces A.y
, yes, but that's not doing dependency analysis, that's identifying the direct dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more convincing example builds on the compositionality of the
rules. For example, a given preprocessor (e.g. c2hs) could be implemented
as built-in rules in Cabal, or as a separate library. If I then want
additional rules on top, I should not need to know the internal
implementation details of the preprocessor rules. If I'm required to refer
to rules directly, I would have to inspect the monadic state of the rules
computation for the preprocessor to find which rule outputs the file I
might want to depend on, instead of directly declaring the dependency on
the file. It seems correct to me that the build system, which has a global
view of all rules, should resolve these dependencies, as opposed to asking
the rules author to do so; the latter feels inherently non-compositional to
me.
|
||
```haskell | ||
newtype Rules env = | ||
Rules { runRules :: env -> ActionsM ( IO [Rule] ) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand it, we need the IO here for two reasons:
- To handle dynamic dependencies, we have to do any dependency-determination work in this IO action and not in the rule
- To handle dynamic rule generation, e.g. to have happy rules for every
Foo.y
in the project.
invocations, as is necessary to compute staleness of individual rules. We can't | ||
simply use the index of the rule in the returned `[Rule]`, as this might vary | ||
if new rules are added. Instead, we propose that a rule be uniquely identified | ||
by the set of outputs that it produces. This gives the necessary way to match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if I define two rules that produce the same thing?
(even though, in common cases, one expects that adding a new source file | ||
would correspond to a new module declared in the `.cabal` file). | ||
|
||
### Identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame we're not Nix and we can't just hash everything...
All hooks need to take into account the IPC workflow. For most hooks, this simply corresponds to the ability to serialise and deserialise certain |
I am glad to see the progress that this proposal has made since it was originally submitted to the cabal repository. My own experience suggests that the jargon might be confusing to those unfamiliar with cabal's architecture; so allow to describe some parts of it. Currently there are four build-types: Simple, Configure, Make, and Custom. Each provides an implemention of the E.g. the Simple build-type is implemented by Distribution.Simple.defaultMain. This is the function that read the Similarly, the Make build-type is implemented by Distribution.Make.defaultMain. Which delegates all tasks to a Makefile included in the package. The "Custom build-type" the proposal talks about is Distribution.Simple.defaultMainWithHooks1. This build-type is different from Simple, Make, and Configure (I will get to that) because it involves custom user code, that is passed to Cabal in the form of UserHooks. One obstacle to rework Cabal build-system is that the Simple build-type, i.e Distribution.Simple.defaultMain, simply calls Distribution.Simple.defaultMainWithHooks with a pre-defined set of hooks: Distribution.Simple.simpleUserHooks. This means we cannot reasonably rework how build-type Simple works without a significant refactoring of the codebase; reimplemeting the Simple build-type without relying on defaultMainWithHooks. The Configure build-type works similarly but using Distribution.Simple.autoconfUserHooks. For some reason there is no dedicated "defaultMain" in this case. One grievance I have with this jargon is the following: Cabal has no notion of "Custom build-type"2. Let me explain: given that the build interface is Setup.hs and that Cabal whole raison d'être is providing implementations of that interface; from Cabal's POV there is no "Custom" implementation of Setup.hs, Cabal itself is the (or rather "a") "Custom" implementation of Setup.hs. So what gives? ℹ️ I don't think this is written central enough in the proposal but the real goal is not to replace UserHooks with the better designed SetupHooks. The overarching goal is to allow cabal-install to go past Cabal's Setup.hs interface. Note: I am aware that I am commenting on this "higher plan", and not on the content of the proposal itself; but I belive the authors would agree with me that the proposal gains most of its value from this bigger plan. When cabal-install sees If I have understood correctly, the plan seems to be:
At that point, cabal-install will be able to know what Setup.hs is doing in the vast majority of cases4 and will be able to skip the CLI interface and, e.g. invoke the user defined hooks directly. Leaving aside cabal's track record when it comes to transitioning to new features; I have doubts about this plan.
The end state does not seem quite clear either (which is of course acceptable) but seems to contraddict some assumptions in the whole plan (which is concerning).
Lastly there is a question of interfaces. Setup.hs, with all its limits, is an interface that as seved the entire community for years. This interface is what allow the existence of multiple build-tools, like cabal-install and stack but also rules_haskell, the various nix frameworks (haskell4nix, haskell.nix, ...) and all the linux distributions. The interface is equalising, it works the same for everyone6. While I am aware that all these tools will always be able to call Setup.hs, I am also afraid that cabal-install going around the CLI interface will lead to an even tighther coupling between Cabal and cabal-install and a potential unbalance within the force. E.g. will HLS work equally well on cabal project and stack projects? Remember that stack still has yet to catch up with the latest change in the Setup.hs interface6. 🙏 I have already discussed these points with the authors of the proposal. FWIW I will be supporting any refactoring work in Cabal and cabal-install necessary to the development of SetupHooks/Cabal-hooks. However, I can offer some recommandations:
Footnotes
|
Apologies if this is a silly question. But where in the proposal is the "IPC workflow" sketched in such a way that I could understand this motivation. My own personal understanding of a possible IPC workflow (which this proposal doesn't describe but just sketches enabling as future work, correct?) would be that a build tool could compile and link SetupHooks.hs into some binary that it drives in some way. (And in that sense, a custom setup.hs making use of SetupHooks is such an ipc workflow, just not a very expressive one). But we would have complete freedom over what that binary is, and so could choose what the inputs and outputs are freely, no matter what the datastructures in SetupHooks.hs are, no? So I think you must have a particular sketch of what the sort of binary you would want produced in mind is -- and either I missed it in the proposal (it is long) or that sketch is only in some people's minds, and it would be nice if it could be explained. |
Thanks for the writeup! I tend to think this is the most straightforward thing, but it definitely warrants thought and discussion: |
@michaelpj writes
Right, so there's a few things worth mentioning here. Yes, IPC is very important here. The rules have to be supplied by the package and interpreted by the build tool. I've no idea how one would take something looking like Shake rules and externalise them. Perhaps someone else has thought about it more, but it looks somewhere between hard and impossible, given features like monadic bind. We don't want to over-specify how the build tool works, which means we can't have the build rules language be too expressive. For example, we would not want to say that the rules language is exactly as expressive as Shake and therefore that more or less the only way to interpret them is in fact by basing your build system on Shake. In principle I wouldn't want to force that, and in practice Cabal and cabal-install would need a massive amount of work to redo them in terms of Shake even if everyone agreed we did want to do that. It's well beyond the scope of this project. On the other hand, for what's currently proposed, @sheaf has a relatively simple prototype/proof of concept. So that's another pull factor: it should be something that's not a million miles away from where Cabal and cabal-install are today, because it needs to be implementable without huge effort. (The original monolithic hooks design was of course pretty trivial to support in Cabal, but lots of people argued for something rule based.) cabal-install already has some degree of file monitoring, caching, fingerprints and rebuild logic, and that's partly reflected in the current proposed design with file/dir watches and value fingerprints. It's certainly true that picking a point in this design space of rule based build systems is not easy or obvious. We recognise that. Indeed we should be humble and admit that we have not done a big literature review, and certainly not before we posted the first fine-grained design. Our original intention was to just mirror the monolithic hook approach of the old custom setup, so we didn't have a fine-grained rule design in our back pockets. But most of the feedback was to go for a more fine-grained design. Ok, excuses out of the way. Since the initial fine-grained deps version of the proposal, we have had more time to read and discuss design alternatives. Sam will be posting an update to the proposal soon with the result of that thinking. I would now frame the point in the design space like this:
This is a balance: it is less expressive than the fully-recursive dynamic dependency combinator approaches, but by having 2 stages to produce a full graph, we think we get just enough dynamic behaviour to cover the vast majority of use cases for the setting of a Cabal package. This puts the design quite close to |
This commit introduces an update of the design of fine-grained rules. Main changes to the design: - rules that depend on the output of another rule must depend on its `RuleId`, and not simply on the path of the output of the rule, - all searching happens in rule generation instead of in the build system, - users are required to define rule names which are used to register rules, with the API returning an opaque `RuleId` (with additional namespacing to ensure they can be combined across packages, using static pointers), - use of static pointers for actions, which allows us to get rid of `ActionId` and to get rid of the doubly-nested monadic structure; this improves debuggability (as we will show rule names that are meaningful to the user), - let the user be in control of what data is passed to the action that executes a rule, - removal of the rule `monitoredValue` (made redundant by `Eq Rule` instance) - additional functionality for declaring extra dynamic dependencies of rules, to avoid recomputing rules entirely when e.g. a `.chs` file is modified. This commit also fleshes out the justification of the design, comparing with Shake and ninja, and adds a bit more context about the requirements imposed by the IPC interface.
4192a81
to
701c36a
Compare
I have pushed a new iteration of the design for fine-grained rules. I've summarised the main changes in the commit message. In particular, the design now includes the suggestion by @michaelpj and @Ericson2314 that rules should directly depend on rules; it indeed leads to a more functional style (no spooky action at a distance). |
As a cabal maintainer I'd like to thank you all for your generous participation in the discussion, design and implementation of the Custom overhaul proposal. Are we reaching a closure? Would the Technical Working Group of the HS like to offer any final remarks or meta-remarks? I'm planning to mention the proposal and discuss the next steps (regarding the Cabal The Library part of the implementation) during the fortnightly cabal devs meeting a week from now (you are all very kindly invited; other meetings, more focused and/or for different time zones, are likely to take place as well, see https://mail.haskell.org/pipermail/cabal-devel/2024-February/010581.html). How should I summarise the discussion above? May I assume everything has been clarified, amended, accepted or agreed to be postponed to the PR review process and beyond? If anybody is of an opinion that there's still a design point that's likely to derail the implementation and the review process, may I ask for a clear (re)statement? BTW, let me take the liberty of copying the remarks and questions from @andreabedini just sent to
|
I had a long conversation with the authors of the proposal yesterday. FWIW the proposed design for the Setup Hooks build type looks alright to me. The introduction of the Cabal-hooks package is a good start but I anticipate that more refactoring might be needed in the future (e.g. splitting out the part of Cabal that Cabal-hooks uses, deciding where to put UserHooks, where to put what is in common, e.g.). With respect to cabal-install, the proposal is still to entirely drop support for the Custom, Configure, and Make build-type; this means that, say, |
We've discussed the proposal at our regular cabal devs meeting today and the conclusion is we are ready for the next steps, but we'd love to know if there are any objections to this proposal from fellow cabal developers, wider ecosystem contributors or anybody else. We'd like to make the final go/no-go decision about this part of the Custom Overhaul in our meeting in 2 weeks. If the decision is "go", we'd start reviewing the implementation PR(s) right away. My personal plea: if you'd like to discuss this design (the Cabal library portion), please do it here and please do it now, not in the implementation PR when we start reviewing it, so that we don't derail the review process. And after the implementation is merged, experimented with, maybe after some user feedback, we can discuss amending the design again. Thank you. |
No objections have been recorded, so the cabal team decided in today's open meeting to gratefully invite the Custom Overhaul implementation team to move forward into the implementation phase (review, etc.) with the intention of merging the contribution. On behalf of cabal developers let me once again thank everybody involved in the design discussions. Over to Adam and Haskell Foundation if there are any extra formal steps to be taken. |
Great! The proposal authors are happy with the current state of the design, and we're grateful to everyone who has helped refine it over the last few months. @sheaf will now be getting the implementation (haskell/cabal#9551) rebased and ready for review by the Cabal team. I'm unsure of the formal process requirements for an RFC like this, but perhaps the TWG could take a final look and either decide this PR can be merged, or let us know what else is needed? (CC @gbaz @Ericson2314 @Kleidukos @LaurentRDC as TWG members involved in the discussion.) |
May I ask a question? Stack currently makes use (with its
Stack obtains that information by making use of the
It does so, because Stack can then use Cabal (the library) to create the autogenerated files for every configured component, without building everything else, before it makes use of In what is proposed, do any of the hooks provide that information, particularly the EDIT: I think I can answer my own question:
|
You're asking a very good question @mpilgrem. If I understand correctly, you're saying that you want I think you have to be careful here; if you just run |
@gbaz @Ericson2314 @Kleidukos @LaurentRDC, may I humbly ping you about ^^^ again? |
The TWG will meet in two weeks, at which stage we'll have a vote. I'll ping the members to take a look and make sure that they ask any questions now so that they can be fully informed at the meeting. |
Thank you! |
Wrapping the function type in 'StaticPtr' made impossible some higher-order patterns, such as those used by the doctest library. So we instead have a separate 'StaticPtr label' argument for the namespacing.
And have a great meeting (or is it over already?). |
It's tomorrow. :) |
Just an update, the TWG voted to accept this proposal and I am going to merge it. We appreciated all the effort in collecting and acting on community feedback! |
Oh, I forgot something, sorry. @adamgundry, can you change this to be in a new |
Done. Thanks @jmct and the TWG! |
This is an RFC regarding proposed design changes to
Cabal
: adding a new build-type that will allow eventual removal ofbuild-type: Custom
. We would welcome input from the Haskell Foundation and the wider community about the proposed design.Rendered.
See also haskell/cabal#9292.