-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pkg3: immutability of compatibility #14
Comments
If we allow compatibility of versions to be mutated after the fact (as we do now in METADATA), one major issue is that it will be impossible, when compatibility has been modified later, to know what the state of compatibility constraints on versions actually were when versions were resolved. This could hide resolution bugs and generally makes understanding the system harder. One possible solution is for each modification of compatibility constraints to increment a build number of a version or something like that, so At that point, however, I have to question why |
In particular, patches don't need to be made on the main repository of a project, they can be made on a fork as long as they are eventually upstreamed back to the main repo. |
This comment has been minimized.
This comment has been minimized.
The reason for distinguishing a compatibility-only change from a patch change is that you may need to make the former long after the fact when there have already been later patch releases. The version history of metadata currently would allow you to reconstruct the state of compatibility (assuming no local metadata modifications have been made), though which commits of metadata are used is not recorded long term. |
If the latest patch release always supersedes previous ones in the the same major-minor series, then you can always just make a new patch. The only way needing
That means we'd have to record the state of all registries in the environment, which ties the meaning of an environment to the history of registries in a way that we are (or at least I am) trying to avoid. If version compatiblity is immutable (in either |
This is not a good idea, as I've said before - there's not a lot of precedent for allowing code changes to completely supercede old versions. If there's going to be a second class of dependency resolution for complete replacement, then it should not be allowing code changes. People break their api in bugfix releases even if we tell them not to, and downstream packages are going to need to be able to use api's that only existed in early patch releases. And this situation might not be noticed immediately, so there could be enough later patch and minor releases that there isn't room to fix the situation by making a new set of renumbered releases. |
So are you ok with the idea of version metadata – especially compatibility – being immutable, but having |
Yes, that seems like a mostly equivalent way of accomplishing the same thing as modifying compatibility in metadata. It records more history permanently (not just in git history), maybe that could be useful though. |
I do think we should keep a log of version history used by local registry copies over time, so you could feasibly implement an "undo" of a global update operation. That's a separate issue though. |
Or are you entirely against the idea that version metadata be immutable? |
Creating such a metadata-only update would be simplified if the metadata was only part of the registry, not the package itself, i.e. Would that be an option? (Or is that already the idea and I misread the proposal?) |
The example I gave in the other thread illustrates why patches are insufficient:
Now user installs Pkg B and Pkg C: the end result would be:
which would be broken. |
@martinholters: Yes, having compatibility info not live in the package repo is definitely a possibility, but it would make it harder for unregistered packages to participate in version resolution. Since making unregistered packages easier to work with was one of the major requests for Pkg3, that's a bit of a problem. Also, if we move compatibility info out of the package itself, where does the developer edit it? The obvious answer is in the registry but I feel like that's not tremendously obvious or developer-friendly. @simonbyrne: This wouldn't be the result under what I've proposed since the existence of Pkg B v2.1.1 would prevent resolution from ever choosing Pkg B v2.1.0 – that's what "strongly favor the latest patch release" is meant to convey. Instead you would get A v1.2.x, B v2.0.0 and C v3.0.0. In the other approach being discussed here, B v2.1.0+1 would fix B v2.1.0's dependencies and would similarly hide B v2.1.0 from consideration when resolving new versions. |
The core of @tkelman's objection (assuming he's not against the idea of immutable version metadata entirely, which would be good to get an answer on), seems to be that updating version metadata via new patches allows metadata fixes to be mixed with bug fixes – well, technically arbitrary source code changes, since people may not just fix bugs in patch versions. But if people stick with bug fixes in patches, this won't be a problem: why would you want a buggier version? Yes, people will screw up bug fixes, but then the appropriate action is to make another patch that fixes the fix. Fixing version metadata for My perspective is that we want to design the package manager so that making patch versions that do anything besides fixing bugs is problematic. This will actively encourage package developers to only fix bugs in patches. Two feature of the proposed design that encourage this are:
Both of these design choices assume that patches with the same major/minor version are equivalent aside from metadata updates and bug fixes. If a package maintainer violates this assumption by adding or removing functionality in a patch, it will cause problems. Problems lead to complaints, which will provide feedback to the maintainer and help them learn that this is bad practice and not do it in the future. This is not based on some sort of groundless optimism that people will do things correctly on their own, it's based on the principle that people respond to feedback and that we can design a system that actively causes people to receive corrective feedback. Is this limiting the ways that package developers can version their packages and have things work smoothly? Yes, but I think that's a good thing. |
If a compatibility-only change can be done only at the registry level without needing the source to change at all, then there's no need for a branch for a compatibility revision. Designing the system to be intentionally rigid and inherently flawed in the face of a behavior that people will commonly do (a recent example, changing the type of a single parameter of a single function - that breaks the api but seems like a minor change), and in a way that cannot be easily fixed once newer versions have been published, is why I think this goal is a bad idea. The core job of a package manager is if source has been published as a release version, it should be possible to depend on it. Demoting the patch level of versioning from this is unnecessary, adds friction to the system, and doesn't gain us anything. Downstream users are the ones who face problems from versioning mistakes, and are incapable of fixing them or working around them without cooperation from the upstream author, or forking the package and re-releasing a new series of different version numbers. We don't gain enough for this to be worth it. |
What qualifies as a bugfix is not always clear cut either. In fixing one bug, you can often accidentally (or intentionally!) break something else that downstream users were depending on. And these issues don't get identified immediately. By the time some of these issues are found, the upstream author may have moved on to a newer release series, that the downstream users don't have time to upgrade to right away (especially if there was a past release that worked fine for them). What option does downstream have to get their code working again? They could publish a fork without any of the more recent releases, but why have we made them go to that trouble when a patch level upper bound would serve the exact same purpose? |
The problem with having registry-only compatibility changes is that it:
The process I'm proposing is straightforward and the same for registered or unregistered packages: keep definitive compatibility info in Preferring the latest patch for version resolution doesn't make it impossible to use older patches, nor does it force users to upgrade to the latest patch – if what they're using works, no problem:
The example you allude to (where was this?) with a changed type parameter is a simple broken patch. The correct fix in such a situation if you depend on the package to exclude that specific broken patch, which solves the problem; if you're the package maintainer, the fix is to revert the part of the change that broke compatibility for someone and make a new patch release. Neither is a big problem. I would love an actual problematic case that can't be handled with what I'm proposing instead of general arguments about what package managers should or shouldn't do. If there's some problem scenario, I want to know about it. The kind of example @simonbyrne presented is exactly what I'm talking about (hopefully my answer to that is convincing to him). The Compat example in #3, is also exactly what I'm talking about: the fact that minor updates to packages with many dependents (Compat being the most extreme example) would force patching of all dependents is a devastating problem with my original proposal, hence #15 (comment). |
The problem is the "broken patch" is broken from the perspective of downstream users who were using the old api, but intended as a new api by the upstream author. Upstream isn't going to revert it. Downstream then needs to indicate that all future patches are broken. That's not possible in this proposal, every new upstream release would break the downstream until downstream gets a chance to add another broken patch to their list. It's not possible for compatibility to be set in stone and never change - compatibility depends on the entire set of possible interacting versions of dependencies, it always changes as new versions get released. |
You are proposing making it impossible to declare version compatibility bounds at patch granularity. That's necessary in the case above, where package B depends on package A, which is at say v 1.3.3 when package B gets written (and it relies on a feature that was new in 1.3.0) Assuming the author of package B can remember or recover from environment info what version of package A did work, there's no way in this proposal of reflecting its requirements since it can't express an upper bound on A v 1.3.6 that caused the problem. It could say every patch from 1.3.6 on is broken, but if those have to be listed individually then it becomes incorrect as soon as an additional 1.3.17 backport gets released. The most practical solution to immediately get a working version of its dependency is to republish a fork of the old version of package A. What problem is solved by disallowing requirements at patch granularity, and disallowing expressing requirements as ranges? |
The subject of this issue is immutability of compatibility, which is orthogonal to patch granularity. I was trying to unmuddy the discussion by splitting #3 in to this issue and #15, which would be a better place to discuss patch granularity, although that's explicitly about the opposite complaint: that the granularity is too fine, which I already conceded. |
Splitting a discussion without posting to that effect in the discussion itself isn't terribly effective. Compatibility constraints are either correct, too tight, or too loose with respect to the time and set of available dependency versions when you state them. As new versions become available, a previously correct set of constraints can become too tight if it doesn't include working versions, too loose if it does not indicate new breakage, or remain correct. Compatibility claims that were too tight or too loose when they were first made may need to be amended after the fact. If making personal registries is simple, then I don't think it's worth worrying about how to amend compatibility for unregistered packages. Source releases should be immutable, compatibility often needs to be amended, so compatibility should be tracked outside of the source. If you need to amend compatibility for an unregistered package, then create a personal registry to track it. |
I really hope that package management and compatibility can be managed On Wednesday, November 16, 2016, Tony Kelman [email protected]
|
+1.618 for allowing me to become unconcerned with anything git related |
@tbreloff package authors need to be responsible for dependency versioning. What features are you using, when things break how do you fix or work around them, etc. That comes with the territory of having dependencies. If you get any help you're lucky, but you can't expect other people to do this for you. An outside-of-the-source copy of the dependency information may need to take priority here though, as in the existing system where metadata is used for registered packages, the package's copy of REQUIRE isn't actually used except at tag time to populate the initial content. A compatibility-only revision release could be a mechanism for this, but it needs to be possible to do that for any published release, not just the latest within a minor series. Compatibility is about the rest of the world with respect to a fixed version of a package - we shouldn't be mixing the release numbering or resolution mechanism for outside-world compatibility within the same system (and constraints) that we use for a package's own source. |
So then maybe what I'd like is a little more subtle. It would be nice if tl;dr Manage as much as possible from within metadata(s) without On Thursday, November 17, 2016, Tony Kelman [email protected]
|
The notion that you can build a functioning ecosystem of reusable software without authors thinking about versioning at all strikes me as incredibly implausible, not to mention totally unscalable. Who's going to be spending all of their time figuring out how to version every single registered package? Your answer here seems to be "I dunno, but not me." If you want to develop software that way, that's cool – then don't register your packages. What I'm proposing will support unregistered packages much better, but it won't change the fact that following along with whatever happens to be on master on a set of packages will not be a good way to build systems that don't break all the time. |
Of course there's a middle ground. Authors think about the high level versioning, but not necessarily the gritty details (that frequently are due to other packages out of their control). Those details should either be handled by automation or by expert guidance, depending on the situation.
When it comes to curated metadata repos, if I'm not a curator then the final responsibility is not mine. Package authors can guide versioning (and should be encouraged to do as much as possible themselves) but this mentality that curators should never make changes to the thing they're curating, but instead to enact social pressure on package authors until they make the exact change that the curator could have done in the first place... it's just stupid. I want to see the curation as disjoint from the code.
I couldn't agree more, which is why I care so much about making it dirt-simple to "do the right thing". |
@StefanKarpinski @tbreloff. Each of you is right, In important measure I have seen the need for handholding in the less well traveled regions of the deep end of the pool. increases superlinearly. @tkelman The work you do helping us deal with tags and git when it goes on a bender probably is more informative than predictive. This Summer and next Fall I expect for Julia a flood of new and very active involvement. Something is going feel the extra weight. 🚶♂️ (mmph, 😢) "I do not want to play with git" (😢, mmph) between update and upgrade. ?uplift |
@simonbyrne: yes, this is probably a good idea. Registry-signed tags make sense too. |
The other benefit is that the community could decide to tag/release without requiring the package author. There have been many times that people would have stepped up and tagged something while the author is on vacation (or whatever). |
So as I understand it, a typical release process might look something like:
Is that what you had in mind? (these points are intentionally a bit vague, in particular point 5, but that is probably best discussed in a different issue) |
That sounds pretty reasonable @simonbyrne. And my point above was that "Package author requests new release" could just as easily be "community requests new release" without any hiccups (with the social understanding that we should default to the author's wishes whenever feasible). |
Yes, roughly, although I might order it like this instead:
One issue with tagging is that IIRC, tags are only transmitted via push/pull, not via pull request, so it's still unclear how to get the tag into the origin repo. For GitHub repos, we could use the tag create API but that doesn't address non-GitHub repos. For those, I suppose we could either have platform-specific APIs or ask the repository owners to pull tags from the registry fork. I'm also not sure where the best point for checking compatibility is. It could be part of the checks step – if it's a patch release, it shouldn't break any packages that depend on it. We could verify that before accepting a version. |
Also, note that git tags are usually for commits not trees, so if we use tree tags (which is possible), it will be a bit unusual. We may want to tag a commit for convenience but associate the version with a tree rather than a commit. |
If the checks fail, you'd need to back out pulling into the registry fork and redo it after the author addresses the issues. This is getting to be a lot of machinery to expect small organizations to maintain their own instances of. |
Why would you need to back anything out? Git commits are immutable. |
Not everyone has enabled branch protection - people do occasionally force push to master of packages. They shouldn't be doing that, but if they do we wouldn't want it to mess up the registry's fork. |
Force pushing a branch doesn't destroy commits, it just changes the commit that a branch points at. |
Depends exactly what "pulls git data into its fork" means then, and where the checks happen. If checks happen in a completely from-scratch clone wherever it's running and don't push anything back to the github copy of the fork unless the checks pass, then it's fine. Pulling into an existing clone's master after a force push is where things can go wrong. |
I will make the wrong choice so that we can argue about it. |
I think "pull" may be the wrong word here: the metadata fork I envision the process as something like the following:
(here I'm not sure about the commit vs tree hash issue, but my experience has been that trees are often harder to work with as they're not really a "user facing" feature of git. Also, I'm not really sure how we would handle non-git sources either. |
one other thing to think about: who "owns" the version numbers. In what I outlined above, it would be the registry, not the package (as emphasised by the fact that it is the registry signing the tag). I'm not sure how this would work in the case of a package being in multiple registries (who decides whether or not it is a valid version?) |
Was that really necessary? "This sort of response is not constructive" either.
We haven't actually solved this problem if everything is duplicated in both the registry and the package. One should take priority over the other. If we design this whole system to ensure they're equal in most normal usage, you still need to pick which to use in case of local divergence or development. Local development probably points to preferring the package's copy, but how local development is supposed to fit with the rest of Pkg3 has not yet been described here. One of the copies of this information is a duplicate and somewhat redundant. It sounds like we're moving towards a very registry-driven design. In use cases other than local development, the package's copy (and upstreaming registry-driven compatibility changes back to it) is fairly vestigial. You want to be able to do dependency resolution without having to first download every version of every package. How would version resolution work on an unregistered package? Right now, unregistered packages have no versions - how would Pkg3 change that? Archiving past versions is a good idea, but doing so by having every registry also maintain git forks of all its packages is making our "github as cdn" abuse worse. |
Yeah, tagging versions is complicated. We may need a "two phase commit" process. |
My point is that your attitude to this discussion has been fundamentally uncharitable and contentious. In this particular instance, there are two ways to do a thing, and instead of giving me the benefit of the doubt that I'm not a moron and will pick the one that works, you assume that I'll do the wrong thing and then argue with me based on that assumption. This attitude is frustrating, comes across as disrespectful, and mires us in unnecessary arguments instead of collaborative exploration of the solution space to find something that addresses everyone's concerns.
Replicating immutable data isn't a problem. That's the principle behind git and most other successful distributed data stores. Having multiple copies is only a problem if they are mutable.
Quite the opposite. If anything, the package repository is primary and registries are just copies of immutable, append-only metadata about package versions, copied from the packages.
This is a good question. I was considering just using tags for versions in unregistered packages. But of course, you generally don't want to bother tagging versions if your package isn't registered, so I'm not sure what the point is. Instead, I think one would just use an environment file in the git repo to synchronize unregistered packages in lock-step (a la MetaPkg), but their dependencies on registered packages can be looser via compatibility constraints in the unregistered package repos.
How else would you do this? If you want to keep an archive of a package's git history you have to make a fork of it in case it goes away at some point. Using git for source delivery has problems, but that's an orthogonal issue. |
Maybe we should separate the two jobs of a registry:
The former is the part that requires intelligence and automation while the latter is dead simple. |
And don't forget #3: user api. Make it dirt-simple for everyone involved to I agree these can be designed separately. On Tue, Nov 22, 2016 at 6:27 PM Stefan Karpinski [email protected]
|
There are many more than 2 ways to do something that is "intentionally a bit vague" and unclearly specified. I've been contentiously arguing against aspects of the design that I don't think will work. Several of which it looks like we've moved away from, but it took discussion. Take it at technical face value, please. Dependency resolution can require global information, which is why registries contain compatibility information for all past versions. Getting the equivalent set of information if the package copy is the primary source would require either downloading all versions, or getting information out of git for many versions simultaneously in a way that we don't currently do anywhere to my knowledge. The latter would make the goal of allowing packages to not have to be git repositories less feasible. If we're only archiving releases that get published to a registry, then why would the git history be needed? If packages are immutable after installation then they can just be source tarballs, and an archive can work like most conventional package managers, just a collection of source release snapshots. |
I was actually thinking of separating them entirely. I.e. first you submit a proposed version to various validation services: services that check things like that the proposed version metadata is well-formed, that its tests pass, that it works with various versions of its dependencies, that it doesn't break various versions of its dependents. Once you've got ok/error from a validation service or services, you can go to a registry and submit that and then the check at the registry is just that the sufficient set of validations have passed. I can even imagine private packages being submitted to cloud-hosted validations services and then registered privately. The set of validations that a version has passed can be attributes of the version; people can filter packages/versions based on validations that it has. |
If someone deletes their git repo, we want to be able to make another full git repo the new source of the package. We need a fork to do that. I'm not sure why you're arguing this point. |
I'm not sure what your point about global version information is. |
Don't we also want to make Pkg3 robust against the "package developer force pushed over master" scenario? So tags need not all The scheme of propagating tags through forks sounds overly complex and unnecessary, and a lot to set up to run a registry. And now we have multiple mutable remotes for any given package - this could get confusing in terms of issue and PR management, if all the downloads are coming from a fork that users should actually ignore. The point about global version information is that the head copy of a package's compatibility contains less information than the registry's copy. Except for the author at tag time, everyone else could delete the package's copy and not notice. "Package is primary" is the remaining item of dispute here, afaict. |
I agree that propagating tags through forks is complicated and maybe impractical. We'll have to see. The main thing we need is copies of the git history for the commits behind various tagged versions, but that could be a separate process from registration. |
If we have a reliable registry-controlled mechanism of obtaining a copy of the release snapshot source with a matching checksum, does it actually need a copy of the git history? Thanks to github it's oddly easier to get straightforward hosting of a full git repo (up to its size limits, anyway) than it is to host arbitrary non-git source snapshots, but I wonder whether we're letting that ease of use drive the design decisions. |
Wouldn't future support of non-git-based packages be problematic if releasing a version would include cloning its git history? Ok, of course one could replace that with "cloning its version history in whatever VCS is being used", but that would make registries much more complicated, as they would have to accommodate every VCS used by packages they want to register. |
We should move this aspect of discussion to its own issue, but I think it's totally reasonable today to require that Julia packages must have a git repo (or git mirror of something else) as the development source of record. What we should try to keep feasible is allowing the flexibility of downloading release tags at install time to users' systems in a form other than a full git clone though. |
As I understand it, GitHub is fairly intelligent about not unnecessarily replicating data across forks (thanks to git's immutable objects), so I don't think this is really an issue. |
Continuing half of the discussion on #3.
The text was updated successfully, but these errors were encountered: