-
-
Notifications
You must be signed in to change notification settings - Fork 760
RFC: Combine multiple RSpec repositories into a single one. #2509
Comments
@rspec/rspec feedback please :) |
Background reading on monorepos:
We don't have enough code (nor should we ever) to start hitting some of the bigger downsides of a monorepo. And also my own relevant background: I was responsible for the CI teams at Square for many years, dealing with a large Java monorepo and many disparate ruby repositories. |
I support this idea. |
I very much support this idea. The amount of overhead we deal with due to having separate repos is demoralizing. Elixir (the main language I dealt with the pass couple years) has first-class support for monorepos with many sub-applications via its umbrella feature and I'm a big fan of the approach when you're tooling supports it. There's also another potential benefit we get out of a monorepo: it can greatly lessen the effort involved in extracting a feature out of the core gems and into an extension gem. For example, if we ever want to extract
I'm in favor of not merging rspec-rails. It's pretty different, and not part of core RSpec. The CI build is way different with all the rails versions it supports. And we've talked about potentially de-coupling it's versioning from the versioning of the rest of RSpec.
Great idea. A few additional thoughts:
Honestly, I think we can just close stale PRs, with a note saying the person can re-open a new PR on the monorepo if they want to see it get merged. I don't think we need to go through the effort of manually importing PRs (and automating it sounds hard). Cleaning up open PRs would be really helpful as a preliminary step.
I would favor this approach over the more complicated analysis approach you mentioned. I think we can optimize this using a well-designed build matrix. The cukes are generally the slow part of our build. I'd favor doing something like:
So that would be 4x matrix entires per supported Ruby version...which sounds bad, but isn't actually any worse than our current situation, where we've got 4 builds (core, expectations, mocks, support) per supported version. And it should run simpler and faster. We can consider not running the cukes on all supported ruby versions if there's a concern about the number of matrix entries.
I don't think I can even wrap my head around how we'd incorporate rspec-rails in a monorepo build. I very much advocate for keeping it separate. It's a special beast.
There's a couple different things rspec-dev provides:
I don't think we need to stringently audit rspec-dev and try to bring everything over--instead, we can just bring things over as we have the need for a task.
This is a great idea. Would
One thing you didn't mention: how are we going to actually merge the source history of the repos? Is there a good tool for that? Obviously, all the SHAs are going to change. If possible, it would be really cool if whatever tool we used to import commits amended the message of each commit with a note like While I'm 100% on board with this idea, I can't say I'm excited (or particularly available) to do much of the work here. Is there someone who is willing to lead this effort? (Looking at you, @xaviershay :)...) |
I'm on board with the mono repo idea, would certainly simplify keeping everything in sync! On the CI side I'd recommend we not split out the various builds into matrices, as it will (ironically) slow the build down. Travis seems to limit the number of in progress builds at a time (one of the reasons why |
👍 for the monorepo idea. Some ideas for the import of commits:
|
Oh interesting, I was just planning to start a new repo and copy the code over, with initial commit being "see the old repos". I can play around with various ways of importing history ... guess it should be pretty straight forward since there's no way they could conflict with one another. |
I think for the prototype repo, copying in all the code is OK so you can work on getting CI setup, etc., but for the final merge I think the extra effort to preserve the source history as much as possible is valuable. I like @yujinakayama's ideas. |
Sorry I have made basically no progress on this. I'm going to de-commit from it rather than let it go on in limbo. I made this prototype repo to start playing around with ideas if anyone wants to pick it up: https://github.com/rspec/rspec-monorepo-prototype |
We could merge the other repos while keeping all the history.
|
Recently I've been working on my project RepositoryMerger which satisfies the requirements for the repository migration, and finally it's almost complete. Here's a prototype monorepo: https://github.com/yujinakayama/rspec-monorepo The largest benefit of the monorepo generated by RepositoryMerger is that we can handle the past codebase and commits in the same way as we do after the migration.
How it worksThough I think the monorepo generated by RepositoryMerger is basically intuitive and easy to handle, below is a brief explanation of the behavior for review from the core team members. First we need to choose target branches, which we consider important as history and want to import into the monorepo. In other words, commits unreachable from the target branches in the original repos (e.g. stale pull requests) won't be imported. In our case, the target branches should be Then we can import commit history for each branch one by one in the following way: configuration = RepositoryMerger::Configuration.new(
original_repo_paths: %w[rspec rspec-core rspec-expectations rspec-mocks rspec-support],
monorepo_path: 'monorepo'
)
repo_merger = RepositoryMerger.new(configuration)
repo_merger.merge_commit_history_of_branches_named('main')
repo_merger.merge_commit_history_of_branches_named('2-2-maintenance')
repo_merger.merge_commit_history_of_branches_named('2-3-maintenance')
# ...
repo_merger.merge_commit_history_of_branches_named('3-10-maintenance') The imported files are placed under subdirectories for each original repo in the same manner as Importing a single set of branchesFor simplicity, let's consider a case of importing only a single set of branches from two repos: creating repo_merger.merge_commit_history_of_branches_named('main') RepositoryMerger imports all the commits from the original branches with the following rules to preserve the original commit graph structure as much as possible:
Importing multiple sets of branchesNow let's consider a case of importing two sets of branches from two repos: creating repo_merger.merge_commit_history_of_branches_named('main')
repo_merger.merge_commit_history_of_branches_named('3-0-maintenance') The first import phase for This result comes as no surprise. Importing multiple sets of branches with complex commit graphHowever, it may produce a surprising result if original repos have complex commit graph structures. For example, let's consider a case where the In this case, RepositoryMerger intentionally creates duplicate commits, instead of reusing commits that would cause commit contamination. As a result, the commit graph seems reassembled. At first glance this may look strange, but if you look at each branch this is reasonable to avoid commit contamination between target branches. Commit message conversionI've implemented the following commit message conversions for convenience and clarity:
Examples: Tag name conversionTags are suffixed with their original component name: Remaining issue: some old tags point orphan commitsThrough trial and error in the development of RepositoryMerger, I've found a strange fact that some old tags in our current repos point orphan commits that have different root commits from the These are the tags with orphan commits:
Visualized commit graphs generated with the following command (note that you cannot distinguish the orphan commits with
I'm not sure how we should handle these tags. Technically importing them into monorepo is possible but it should cause confusion since the imported branch will also be an orphan one and contain a lot of duplicate commits. @myronmarston Do you know how these tags have been created? @rspec/rspec What do you think? |
@yujinakayama: amazing work! There's a ton of nuance and complexity here that I didn't realize. On a side note, I've gotten a bunch of emails from github during this process due to imported commits that I was wondering what was causing this; now I know :). In theory it would be nice to prevent any additional test runs that you do from sending those emails (I imagine I'm not the only one getting them...). One idea would be to change There's one key invariant that seems really important to verify: for each imported branch, the file system contents of a gem subdirectory must match the file system contents of the same branch in the original repository. For example, on the
These all sound good, but I think there's one more that would be helpful: when a commit SHA is mentioned in a commit message (e.g.
Not off the top of my head. I have no memory of it. Let me dig into it and get back to you. Also, one more thing to consider: what github repo are we going to use for this? In theory Thoughts? |
I spent some time looking into this and it's super weird. I have no specific memories of how that happened. I took a look at rspec-core v2.11.3 (and v2.11.2) and they have a 100% alternate history, all the way back to the first commit, so that they actually have no shared commits with the main branch. I was thinking maybe someone rebased and force pushed but I don't see how that could even cause this state. Maybe it's possible it got corrupted some how? Although with git's hash-based model I don't really see how that could happen either. In theory, there might be a way to "repair" it. If you check out the v2.11.2 and v2.11.3 tag I can see there's a commit in the history that claims to be the 2.11.1 release, so you could perhaps branch off of the v2.11.1 tag, and cherry-pick each commit after the alternate 2.11.1 release commit up through the v2.11.3 tag on the new branch. If the cherry-picks applied cleanly (a big if...) you'd have a "clean" 2-11 branch that you could use during the import process. ....but with all that said, I don't think it's worth the effort to try to fix these orphaned branches. I can't imagine anyone is going to use the old 2.x maintenance branches for anything ever again. And it's worth remembering that the import process is not destructive: the original |
@myronmarston Thanks for the feedback!
Oh, sorry. I didn't imagine that the mentions were included in the commit messages. I'll modify them as you suggested next time I rebuild the monorepo.
Yeah, it's tested in this spec by comparing lists of files with digests. I didn't think of your approach with git tree objects, though, I'll add spec examples with that approach for more confidence. I just confirmed that it worked by hand 👍🏻:
It's possible, but I noticed a concern as I extracted commit references in the URL format, there're some references that point commit comments on GitHub. If we replace them with URLs pointing the imported commits, we cannot see the comments unless follow the link I think we have two options for this:
I'd prefer the new repository approach since keeping the merged contents and the old metagem contents together should confuse the future maintainers, like our current situation about the orphaned branch. Currently I'm thinking the following plan:
I was thinking exactly the same thing. Repairing them might be possible (I think another issue should arise in the process, though 🙄), but it's a sort of "history rewrite" which is something I'm reluctant to do if possible. |
By the way, there're some missing maintenance branches where no patch version has been released for the minor version. Since they are needed to merge branches properly (I've been using my forked repos in the deveploment), I'll run the following commands and push the branches: cd rspec
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-7-maintenance v2.7.0
git branch 2-9-maintenance v2.9.0
git branch 2-10-maintenance v2.10.0
git branch 2-11-maintenance v2.11.0
git branch 2-13-maintenance v2.13.0
cd rspec-core
# v2.13.1 is included in the main branch
git branch 2-13-maintenance v2.13.1
cd rspec-expectations
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-10-maintenance v2.10.0
git branch 2-13-maintenance v2.13.0
cd rspec-mocks
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-7-maintenance v2.7.0
git branch 2-9-maintenance v2.9.0
git branch 2-10-maintenance v2.10.1 |
For the issue of the orphaned branches, I found new facts, which are good news:
Orphaned branch in rspec-core$ cd rspec-core
$ git checkout v2.11.3
$ ls *.gemspec
rspec-mocks.gemspec
$ grep \\.name rspec-mocks.gemspec
s.name = "rspec-mocks"
$ cd ../rspec-mocks
$ git show --quiet v2.11.3
tag v2.11.3
Tagger: Myron Marston <[email protected]>
Date: Wed Sep 19 20:44:40 2012 -0700
Version 2.11.3
commit d0bcce439a6bf9b3245db3553ad5e82233f96634 (HEAD, tag: v2.11.3)
Author: Myron Marston <[email protected]>
Date: Wed Sep 19 20:42:17 2012 -0700
2.11.3 release.
$ cd ../rspec-mocks
$ git show --quiet v2.11.3
tag v2.11.3
Tagger: Myron Marston <[email protected]>
Date: Wed Sep 19 20:44:40 2012 -0700
Version 2.11.3
commit d0bcce439a6bf9b3245db3553ad5e82233f96634 (tag: v2.11.3, origin/2-11-maintenance, 2-11-maintenance)
Author: Myron Marston <[email protected]>
Date: Wed Sep 19 20:42:17 2012 -0700
2.11.3 release. Orphaned branch in rspec-mocks$ cd rspec-mocks
$ git checkout v2.7.1
$ ls *.gemspec
rspec-core.gemspec
$ grep name rspec-core.gemspec
$ grep \\.name rspec-core.gemspec
s.name = "rspec-core"
$ git show --quiet v2.7.1
tag v2.7.1
Tagger: David Chelimsky <[email protected]>
Date: Thu Oct 20 07:02:57 2011 -0500
Version 2.7.1
commit d63dc5d563b90e6344b7139a614dc6eab3a50eb4 (HEAD, tag: v2.7.1)
Author: David Chelimsky <[email protected]>
Date: Thu Oct 20 07:01:24 2011 -0500
bump to 2.7.1
$ cd ../rspec-core
$ git show --quiet v2.7.1
tag v2.7.1
Tagger: David Chelimsky <[email protected]>
Date: Thu Oct 20 07:02:57 2011 -0500
Version 2.7.1
commit d63dc5d563b90e6344b7139a614dc6eab3a50eb4 (tag: v2.7.1)
Author: David Chelimsky <[email protected]>
Date: Thu Oct 20 07:01:24 2011 -0500
bump to 2.7.1 We can say that they have exactly the same history and contents because they have the same commit sha. Also, versions with the problematic tags haven't been released as gems: Tag versions vs. released gem versions in rspec-core$ cd rspec-core
$ git tag | perl -pe 's/^v//' | sort > tag_versions.txt
$ curl https://rubygems.org/api/v1/versions/rspec-core.json | jq --raw-output '.[].number' | sort > gem_versions.txt
$ diff -u gem_versions.txt tag_versions.txt
--- gem_versions.txt 2021-10-14 18:53:09.000000000 +0900
+++ tag_versions.txt 2021-10-14 18:53:00.000000000 +0900
@@ -1,3 +1,4 @@
+0.0.0
2.0.0
2.0.0.a1
2.0.0.a10
@@ -22,6 +23,7 @@
2.0.0.beta.19
2.0.0.beta.2
2.0.0.beta.20
+2.0.0.beta.21
2.0.0.beta.22
2.0.0.beta.3
2.0.0.beta.4
@@ -37,6 +39,8 @@
2.10.1
2.11.0
2.11.1
+2.11.2
+2.11.3
2.12.0
2.12.1
2.12.2
@@ -61,9 +65,14 @@
2.5.1
2.5.2
2.6.0
+2.6.0.rc1
2.6.0.rc2
+2.6.0.rc3
2.6.0.rc4
+2.6.0.rc5
2.6.0.rc6
+2.6.1
+2.6.2
2.6.2.rc
2.6.3
2.6.3.beta1
@@ -75,6 +84,7 @@
2.8.0.rc1
2.8.0.rc2
2.9.0
+2.9.0.rc1
2.9.0.rc2
2.99.0
2.99.0.beta1 Tag versions vs. released gem versions in rspec-mocks$ cd rspec-mocks
$ git tag | perl -pe 's/^v//' | sort > tag_versions.txt
$ curl https://rubygems.org/api/v1/versions/rspec-mocks.json | jq --raw-output '.[].number' | sort > gem_versions.txt
$ diff -u gem_versions.txt tag_versions.txt
--- gem_versions.txt 2021-10-14 18:54:13.000000000 +0900
+++ tag_versions.txt 2021-10-14 18:54:03.000000000 +0900
@@ -1,3 +1,4 @@
+0.0.0
2.0.0
2.0.0.a1
2.0.0.a10
@@ -22,6 +23,7 @@
2.0.0.beta.19
2.0.0.beta.2
2.0.0.beta.20
+2.0.0.beta.21
2.0.0.beta.22
2.0.0.beta.3
2.0.0.beta.4
@@ -53,19 +55,34 @@
2.14.5
2.14.6
2.2.0
+2.2.1
2.3.0
+2.3.1
2.4.0
2.5.0
+2.5.1
+2.5.2
2.6.0
+2.6.0.rc1
2.6.0.rc2
+2.6.0.rc3
2.6.0.rc4
+2.6.0.rc5
2.6.0.rc6
+2.6.1
+2.6.2
+2.6.2.rc
+2.6.3
+2.6.3.beta1
+2.6.4
2.7.0
2.7.0.rc1
+2.7.1
2.8.0
2.8.0.rc1
2.8.0.rc2
2.9.0
+2.9.0.rc1
2.9.0.rc2
2.99.0
2.99.0.beta1 Though I still cannot imagine how these branches have accidentally been created, we no longer need to concern about them. |
I tried to transfer https://github.com/yujinakayama/rspec-monorepo-migration to rspec organization but failed since I don't have the permission to create public repos on rspec organization. @JonRowe Could you give me the permission? |
@yujinakayama I think you have permission now |
@JonRowe Thanks, it worked. |
I'm very much in favour of running forward with a monorepo, perhaps as part of the RSpec 4 release process, there will be a lot of massaging PRs and issues running forward and I think we will have to mark the old repos as "archived" with large notices in the read me's. I'm going to go ahead and start this by renaming the current meta gem repo. Thank you so much for working on this @yujinakayama |
@yujinakayama Fantastic job, thank you!
@JonRowe I have some time to spare in November/December. |
Well, this thread is quite old, and I know @JonRowe is actively mono-repoing now, but I wanted to check if you've considered the incremental approach - instead of trying to import everything to one place all at once, you could start by shifting two of the repositories together. It'd reduce the time-frame and friction (and the amount of in-flight juggling), and let you iron out the kinks with a smaller working set. Then you could import the other gems to it one at a time afterward as time allows. The payoff isn't as immediate, but I'd find it a lot less stressful, were it me :-) |
The main blocker at the moment is the build matrix for the monorepo, I've made good progress on unifying the build I just need to find time to finisht that. By using a Github action to run the build its no more complicated for all the gems than it is for 2, so doing it incrementally would increase the amount of work as you'd have to tie into the existing builds |
Summary
Combine
rspec-core
,rspec-mocks
,rspec-expectations
,rspec-support
,rspec-dev
(which would be mostly deprecated with this proposal), therspec
meta-gem, and possiblerspec-rails
into a single repository.Motivation
Making cross-cutting changes to RSpec involves a complicated dance of PRs. Selected examples:
rspec-dev
sync. As well as been a pain at the time, this has caused confusion for a subsequent contribution.rspec-support
to fix a bug in (say)rspec-mocks
requires a PR to support, a PR to mocks pointing at that branch to demonstrate that the support change works, merging the support PR, then redoing the mocks PR to point back at master. (e.g. Add specs demonstrating problem with verified doubles and keyword args. rspec-mocks#726)rspec-dev
tasks to "sync" configuration to all repos requires a PR for each repo.It's also not always clear to issue reporters which project is appropriate for an issue. We shouldn't expect newcomers to know which sub-project an issue could be related to!
Approach
Issues
Given a motivation is to simplify issue reporting, we should migrate them all to the main
rspec
project (which current has issues disabled) using something likegithub-issues-import
. We can use tags (mocks
, etc...) to track what came from where, but for new issues they would be optional.Inflight PRs
We should explicitly close or merge every open PR. This will include closing many stale PRs, that will need to be re-opened against the new monorepo. We can mitigate this with a notice period (i.e. "one month from now we're merging repos").
CI
I see two ways to set up CI:
This is where
rspec-rails
would be an interesting addition, because it has a test matrix with rails versions that isn't applicable to the other projects and would probably force us to option 2. I'm leaning towards keeping it out, at least to begin with. We can always incorporate it later!For now we should benchmark option 1.
rspec-dev
We need to audit
rspec-dev
to see what things it does are still appropriate and how to best incorporate them into the monorepo. Given it's primarily used to "manage" multiple repos, my hope is it can be mostly deprecated. This will be investigated in the prototype suggested below.Drawbacks and mitigations
CI limit of simultaneous jobs constraints our a bit tighterNext steps
rspec-monorepo
), including setting up CI and incorporatingrspec-dev
.The text was updated successfully, but these errors were encountered: