Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace submodules with subtrees #70651

Closed
oli-obk opened this issue Apr 1, 2020 · 54 comments
Closed

Replace submodules with subtrees #70651

oli-obk opened this issue Apr 1, 2020 · 54 comments
Labels
A-meta Area: Issues & PRs about the rust-lang/rust repository itself C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Apr 1, 2020

cc @rust-lang/clippy @RalfJung

cc @rust-lang/compiler @rust-lang/infra we gate on clippy soon, so if you break clippy you need to fix it immediately in the same PR. This is trivial now though, since we are basically in a monorepo state, so you just edit stuff in src/tools/clippy/ until everything works again.

Synchronizations of the remote repo and the local subtree is explained in CONTRIBUTING.md

The first port (clippy) is being done in #70655

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Apr 1, 2020
@RalfJung
Copy link
Member

RalfJung commented Apr 1, 2020

Cc @rust-lang/miri because we have that now :D

@eddyb
Copy link
Member

eddyb commented Apr 1, 2020

Hmm, I was led to believe that git subrepo was nesting git repositories, in a way which git natively handles. But it turns out it's just a monorepo sync tool, same as manually making large commits. I don't think losing commit history is acceptable.

At a glance, git subtree looks like it might do the right thing, i.e. keep the repository nesting of a submodule, but I haven't tried it yet to see how it actually behaves in practice.
EDIT: ah, no, git subtree just seems to create synthetic commit histories. Still, could be better than a single sync mega-commit.
EDIT2: somehow it still managed to preserve commit history?! Compare:
https://github.com/rust-lang/rust-clippy/commits/master
https://github.com/rust-lang/rust/commits/c211cea3e99e04c2980a853b6637de22881b72eb

@oli-obk oli-obk changed the title Replace submodules with subrepos Replace submodules with subtrees Apr 1, 2020
@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

I moved to git subtree.

@eddyb
Copy link
Member

eddyb commented Apr 1, 2020

In case it gets lost on the PR, I said: #70655 (comment)

Whoa, clippy commit hashes were kept intact, this makes me really happy!11

@RalfJung
Copy link
Member

RalfJung commented Apr 1, 2020

I moved to git subtree.

Would be good to get the pro's and con's of that decision documented somewhere. git submodule advertises itself as a better alternative to git subtree; why is subtree still the better choice for us?

(Also I have no idea in which thread to post now, there's at least this issue and 2 PRs. It's very confusing.^^)

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

I don't know why git subrepo is better than git subtree other than what the documentation of git subrepo states at https://github.com/ingydotnet/git-subrepo#benefits

Fixes known rebase failures with git-subtree.

git subtree is better than git subrepo because it preserves all commits in history instead of squishing them in the name of the person who synchronizes it.

@eddyb
Copy link
Member

eddyb commented Apr 1, 2020

git subtree is upstream, which is nice, and it also supports preserving the commits better.
I believe the "rebase failures" that git subrepo talks about are actually an anti-pattern: rebasing a merge commit, it's not meant to work at all (the rebase will include both branches and lose the merge commit - this can already happens if you accidentally rebase the wrong thing in rust-lang/rust).

git subrepo seems to want to just throw all the changes into single large commits so you "don't have to deal with it", whereas git subtree is more history-preserving-oriented, which I prefer.

@RalfJung
Copy link
Member

RalfJung commented Apr 1, 2020

git subtree is better than git subrepo because it preserves all commits in history instead of squishing them in the name of the person who synchronizes it.

Is this true in both directions (subtree pull/push)?

Does git subtree also have the problem that a "subtree push" needs a commit on the rustc side?

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

no, git subtree push does not need a commit on the rustc side.

The only downside of git subtree is that you need to specify the folder name and target repo on every synchronization (because there's no synchronization metadata file)

@Manishearth
Copy link
Member

I would prefer the history preserving option. I was under the impression that subrepo had a no squash mode but that could be subtree.

@RalfJung

This comment has been minimized.

@oli-obk

This comment has been minimized.

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

I was under the impression that subrepo had a no squash mode but that could be subtree.

subtree doesn't squash by default and has a sqash option.

@flip1995
Copy link
Member

flip1995 commented Apr 1, 2020

@oli-obk IIUC, Clippy can then fix a nightly version and doesn't have to use the rustc master anymore (correct?). If that's the case, I'd go ahead and make these changes to the Clippy CI/infra, so it will be ready once the subtree PRs are ready to get merged.

@RalfJung

This comment has been minimized.

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

Clippy can then fix a nightly version and doesn't have to use the rustc master anymore (correct?). If that's the case, I'd go ahead and make these changes to the Clippy CI/infra, so it will be ready once the subtree PRs are ready to get merged.

well, only if we make the synchronization at specific commits that work on the current nightly. Between two nightlies there can be many states where the clippy subtree will only compile with a master-rustc.

@flip1995
Copy link
Member

flip1995 commented Apr 1, 2020

IMO we want to do this, since otherwise, we would have to do rustup commits in Clippy just as often as before. It will just be easier, since it will only be a git subtree push.

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

Well, we can do it at random points in time, but yes, we can target those points in time to the commit that caused the most recent nightly to be created.

@Manishearth
Copy link
Member

Right, we can have a nightly clippy CI job that notices when clippy stops building with master, and sync then.

@oli-obk
Copy link
Contributor Author

oli-obk commented Apr 1, 2020

at that point we can just have a nightly job that synchronizes once a night automatically to ensure that even nonbreaking changes get synchronized as often as possible.

@mati865
Copy link
Contributor

mati865 commented Apr 2, 2020

Clippy's CI could perform pull every time it tries to merge PR.
That would avoid most of conflicts (from maintainer PoV) when new lint using the "old code" is being added but Clippy in Rust repo was updated for the "new code" already.

Edit: There would be still Rustups necessary for lints added to Clippy shortly before breaking changes since Clippy in Rust repo won't have them yet.

@eddyb
Copy link
Member

eddyb commented Apr 2, 2020

@oli-obk @RalfJung What do you think of this logic? Just thought of it, not sure where to leave it.

git subtree pull is an injection, git subtree push is a projection, hence why pulling preserves commits verbatim (hash included) while pushing rewrites them to keep only the relevant parts.
They're both information preserving (wrt file diffs and commit messages) for the subtree directory, and only for that directory.

@Manishearth
Copy link
Member

Though I guess this doesn't actually need to be versioned with Rust so a cargo install makes more sense

@oli-obk oli-obk added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Oct 29, 2020
@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 29, 2020

I marked this as "looking for participation". Working on this git subcommand that can be installed via cargo install requires zero work on rustc. Basically what you need to do is to grab the latest version of the subcommand and copy the file with the appropriate license mentioned into a new cargo project. The main function can then include_str! the script and invoke sh to execute the text (maybe piped in via stdin?) by forwarding all command line args to the script. You'll need to add rust-lang-owner to the owners of your git repository and the crates.io owners of your crate.

Ideally you test the crate by talking with the clippy team on zulip and doing a clippy sync with the "new" tool. If all that works, we'll update the documentation on our git subtree usage to reflect the fact that ppl need to install the tool and invoke a slightly different subcommand name.

@mominul
Copy link
Contributor

mominul commented Oct 29, 2020

I'd like to work on this issue!

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 29, 2020

great! If you have any questions, feel free to ping me on zulip or just ask here. There's also a bot, so if you write @rustbot claim it should assign this issue to you.

@mominul
Copy link
Contributor

mominul commented Oct 29, 2020

@rustbot claim

@pietroalbini
Copy link
Member

I'm wondering if we need this to actually be on crates.io. We could vendor the script in the repository and call ./x.py subtree.

@oli-obk
Copy link
Contributor Author

oli-obk commented Oct 30, 2020

I like having it as an independent thing. Others may want to use it, too. It's not rustc specific

@mominul
Copy link
Contributor

mominul commented Oct 31, 2020

Here's the gitree crate which wraps the git subtree script. I have included the script form the git PR mentioned here. I am not well versed in handling git, so can anyone test it or give me the instructions for doing it? Here's the Zulip topic about this crate.

@bjorn3
Copy link
Member

bjorn3 commented Oct 31, 2020

The crate metadata doesn't refer to the repo, so posting it here for others trying to find it: https://github.com/mominul/gitree

@RalfJung
Copy link
Member

Using subtrees can lead to spuriously re-closing issues that were re-opened. This just happened to #82080:

the original commit did a lot of things, among them fixing clippy. The commit description contained "Fixes #82080". Each commit that affects clippy is synced to clippy (creating a new commit ID since all the non-clippy changes are removed from it), and then that is synced back (with the same commit ID as what it got in clippy). So now we got a fresh commit that still has "Fixes #82080" in its description.

I am not sure what to do about this, but we have a notable risk here of losing track of issues by accidentally closing them.

Can we make it so that the commit that is synthesized for clippy doesn't just have its non-clippy changes removed, but also has its commit message "sanitized" to ensure that it will not cause any automatic actions by GitHub?

@bjorn3
Copy link
Member

bjorn3 commented Mar 26, 2021

Can we make it so that the commit that is synthesized for clippy doesn't just have its non-clippy changes removed, but also has its commit message "sanitized" to ensure that it will not cause any automatic actions by GitHub?

I think git subtree would then consider it a different commit and try to pull the original conmit every time you sync.

@flip1995
Copy link
Member

There's the option in subtree to squash-sync the commits to rustc. But that way we would lose the Clippy history in rustc.

If we want to keep the history of Clippy and keep using subtree, we just have to live with this, I guess...

@RalfJung
Copy link
Member

I think git subtree would then consider it a different commit and try to pull the original conmit every time you sync.

Why that, doesn't it have some kind of bookeeping to match up the commits? Relying on commit messages seems rather fragile?

@bjorn3
Copy link
Member

bjorn3 commented Mar 26, 2021

I assumed it required the commit to be identical where possible. Based on a cursory reading of the source it seems to only require the tree to be the same: https://github.com/gitgitgadget/git/blob/fe2e4819b869725f870cd3ce99f1f8150fe17dc1/contrib/subtree/git-subtree.sh#L598

@flip1995
Copy link
Member

Copying comments from #82080:

@jplatte wrote:

This sounds like a general problem with the subtree approach...

IMHO this is more of a problem with GitHub issue / PR references in commit messages. It also happens when merging repos, no matter if you use subtree or sth. else, as long as you keep the history you can have super old commits that suddenly affect (or just reference) new and entirely irrelevant issues. My approach has been to always keep these references out of commits, and only have them in PR descriptions and comments.

@flip1995 wrote:

The problem is, that bors includes the PR body in the commit message. Maybe bors could sanitize the PR body and remove keywords that results in closing issues. This doesn't fix the problem, when normal commit messages have a <keyword> #issue in them, but I think the main problem are bors commits copying PR bodies.

@RalfJung
Copy link
Member

The problem is, that bors includes the PR body in the commit message.

Good point. Not doing that for the "active" messages could also help. (It'd also cut down on notifications, if bors removed @mentions as well.)

@Manishearth
Copy link
Member

I feel like that's a pretty rare occurrence and unfortunately the tooling doesn't support doing such a transform currently. We could tweak the bors thing though

@RalfJung
Copy link
Member

Well, it's also something that has a high chance of flying below the radar when it happens. All we know is it happened at least once.

bors added a commit to rust-lang-ci/rust that referenced this issue May 15, 2021
…crum

Convert rustfmt from a submodule to a subtree

r? `@calebcartwright` cc `@Manishearth` `@Mark-Simulacrum`

The motivation is that submodule updates cause rustfmt to not be available on nightly a lot; most recently it was unavailable for over 10 days, causing the beta release to be delayed. Additionally this is much less work on the part of the rustfmt maintainers to keep the rustfmt compiling, since now people making breaking changes will be responsible for fixing them.

I kept the rustfmt git history so it looks like there are thousands of commits. The important commits are https://github.com/rust-lang/rust/compare/851dee3af9404bf399c3c4ffefe5105edb3debad~..pull/82208/head. This adds about 10 MB of git history, which is not terribly much compared to the 702 MB that already exist.

- Add `src/tools/rustfmt` to `x.py check`
- Fix CRLF issues with rustfmt tests (see commit for details)
- Use `rustc_private` instead of crates.io dependencies

  This was already switched upstream and would have landed in the next submodule bump anyway. This just updates Cargo.lock for rust-lang/rust.

- Add `yansi-term` to the list of allowed dependencies.

  This is a false positive - rustc doesn't actually use it, only rustfmt, but because it's activated by the cargo feature of a dependency, tidy gets confused. It's fairly innocuous in any case, it's used for color printing.
  This would have happened in the next submodule bump.

- Remove rustfmt from the list of toolstate tools.
- Give a hard error if testing or building rustfmt fails.
-  Update log to 0.4.14

   This avoids a warning about semicolons in macros; see the commit for details.

- Don't add tools to the sysroot when they finish building.

  This is the only change that could be considered a regression - this avoids a "colliding StableCrateId" error due to a bug in resolve (rust-lang#56935). The regression is that this rebuilds dependencies more often than strictly necessary. See the commit for details.

Fixes rust-lang#85226 (permanently). Closes rust-lang#82385. Helps with rust-lang#70651. Helps with rust-lang#80639.
@tmandry
Copy link
Member

tmandry commented Jan 11, 2022

FYI there's been some discussion of experimenting with miri on Fuchsia, and transitioning miri to an always-green subtree is one of the primary blockers.

@oli-obk
Copy link
Contributor Author

oli-obk commented Jul 3, 2023

Clippy is now using subtree, miri is using https://github.com/josh-project/josh/ (a Rust thing that is better and more powerful in various ways than subtree)

@oli-obk oli-obk closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-meta Area: Issues & PRs about the rust-lang/rust repository itself C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests