Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-64383] combined refrepo became our bottleneck, support a fanout location too #644

Open
wants to merge 145 commits into
base: master
Choose a base branch
from

Conversation

jimklimov
Copy link
Contributor

@jimklimov jimklimov commented Dec 7, 2020

JENKINS-64383 - combined refrepo became our bottleneck

As detailed in the JIRA issue, our heavy use of a single combined reference repository made it more a bottleneck and cause of job timeouts than a speedup and reliability improvement which it once was. This PR explores a way to keep the single point of configuration of the reference repository directory, suffixed with some "magic variable" to substitute a path to subdirectory with a smaller-scope reference repository for a particular source Git URL. On file systems with symlinks it is possible to maintain several such names that would point to the same directory, for closely-related repositories or different URLs of the same repository.

This PoC introduces trivial support for reference repository paths ending with /${GIT_URL} to replace by url => funny dir subtree in filesystem. Its limitation at the moment is that the URL is pasted in verbatim - this works for Linux and Unix like systems that only forbid a 0x00 and a slash from being characters in a filename, and slash suits us as a directory subtree separator. This code likely won't run on Windows as is (colon in https: and likely other chars - Microsoft has an extensive list of invalid chars).

The next ideas, commented but not yet PoCed, are to either escape such characters (non-ASCII and offensive to at least one popular filesystem), or convert URLs into base64 strings or sha/md5/... hashes. Using submodules and finding a way to map several URLs to a certain submodule might be a good idea if they keep indexes separately. This all can be built on top of this PoCed code by introducing further suffixes and handling for them.

It was tested on a MultiBranch pipeline job, where an original definition of the reference repository was suffixed with the new magic string, yielding /home/abuild/jenkins-gitcache/${GIT_URL} (verbatim in "Advanced clone behaviours"). During the checkout into a wiped workspace, with this plugin variant installed:

Cloning the remote Git repository
Cloning repository https://github.com/zeromq/czmq.git
 > git init /dev/shm/jenkins-swarm-client/workspace/CZMQ-upstream_master # timeout=10
[WARNING] Parameterized reference path replaced with: /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
Using reference repository: /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
Fetching upstream changes from https://github.com/zeromq/czmq.git
 > git --version # timeout=10
 > git --version # 'git version 2.1.4'
 > git fetch --tags --progress https://github.com/zeromq/czmq.git +refs/heads/*:refs/remotes/origin/* # timeout=40

Avoid second fetch
Checking out Revision fbe313cd2010bace7833fe52d419f82282343bd9 (master)

 > git config remote.origin.url https://github.com/zeromq/czmq.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config core.sparsecheckout # timeout=10
 > git checkout -f fbe313cd2010bace7833fe52d419f82282343bd9 # timeout=10

Commit message: "Merge pull request #2139 from bluca/ci_failures"
 > git rev-list --no-walk fbe313cd2010bace7833fe52d419f82282343bd9 # timeout=10

This completed quickly, much faster than the usual checkout with huge refrepo in original /home/abuild/jenkins-gitcache/, and did automatically find the "funny" /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git directory prepared with the single repo's reference cache:

# ls -la /home/abuild/jenkins-gitcache/https://github.com/zeromq/czmq.git
total 38
drwxr-xr-x 7 4294967294 4294967294   12 Dec  7 19:31 .
drwxr-xr-x 3 4294967294 4294967294    3 Dec  7 19:29 ..
-rw-r--r-- 1 4294967294 4294967294 2353 Dec  7 19:31 FETCH_HEAD
-rw-r--r-- 1 4294967294 4294967294   23 Dec  7 19:30 HEAD
drwxr-xr-x 2 4294967294 4294967294    2 Dec  7 19:30 branches
-rwxr--r-- 1 4294967294 4294967294  204 Dec  7 19:30 config
-rw-r--r-- 1 4294967294 4294967294   73 Dec  7 19:30 description
drwxr-xr-x 2 4294967294 4294967294   11 Dec  7 19:30 hooks
drwxr-xr-x 2 4294967294 4294967294    3 Dec  7 19:30 info
drwxr-xr-x 4 4294967294 4294967294    4 Dec  7 19:30 objects
drwxr-xr-x 5 4294967294 4294967294    5 Dec  7 19:31 refs
lrwxrwxrwx 1 4294967294 4294967294   43 Dec  7 19:30 register-git-cache.sh -> /mnt/jenkins-gitcache/register-git-cache.sh

DOCS NOTE: With 2.36.x and newer Git versions, if your reference repository maintenance script runs as a different user account than the Jenkins server (or Jenkins agent), safety checks about safe.directory (see https://github.blog/2022-04-18-highlights-from-git-2-36/) can be disabled by configuring each such user account:

:; git config --global --add safe.directory '*'

@jimklimov jimklimov force-pushed the refrepo-args branch 2 times, most recently from 191ca38 to 694ee90 Compare December 8, 2020 12:34
@jimklimov jimklimov marked this pull request as draft December 10, 2020 01:43
@MarkEWaite MarkEWaite added the enhancement Improvement or new feature label Dec 13, 2020
…RL} to replace by url => funny dir subtree in filesystem
…atibleGitAPIImpl.java so its logic (expected to grow in complexity) can be shared by both JGitAPIImpl.java and CliGitAPIImpl.java
…d ref-repos in submodule checkouts (only CliGitAPIImpl.java has it)
…(): do not bother normalizing the URL if the string is not with supported suffix
…intsToLocal*Mirror() with custom paths and bare vs workspace repos
…refactor getObjectPath(referencePath) to check on git dirs elsewhere later
…erenceRepository() and isParameterizedReferenceRepository() taking a File reference (not only a String) object
… keep original reference intact, and as indicator to recreate referencePath object once for many cases
…256_FALLBACK suffixes for using unsuffixed directory if expanded path points nowhere useful
@github-actions github-actions bot removed the test Automated test addition or improvement label Apr 11, 2023
jimklimov added a commit to jimklimov/git-client-plugin that referenced this pull request Jun 14, 2023
jimklimov added a commit to jimklimov/git-client-plugin that referenced this pull request Jun 14, 2023
jimklimov added a commit to jimklimov/git-client-plugin that referenced this pull request Jun 14, 2023
…MarkEWaite) and fix some

Keeping an eye out for comprehensibility of the resulting source code text
Promulgates inconsistent coding style and breaks "logical" markup
of messages defined on multiple lines (like "item " + javavar) which
it fails to keep together, etc.

Unavoidable evil I guess, but hopefully someone can configure it.
@MarkEWaite MarkEWaite requested a review from a team as a code owner September 13, 2023 12:55
@github-actions github-actions bot added the tests Automated test addition or improvement label Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement or new feature ShortTerm Short term improvements tests Automated test addition or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants